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The content of the above referenced applications is incorporated herein by 
reference in its entirety. 
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Field of the Invention 
The present invention provides recombinant methods and materials for 
producing polyketides by recombinant DNA technology. The invention relates to 
1 5 the fields of agriculture, animal husbandry, chemistry, medicinal chemistry, 
medicine, molecular biology, pharmacology, and veterinary technology. 

Background of the Invention 
Polyketides represent a large family of diverse compounds synthesized 

20 from 2-carbon units through a series of condensations and subsequent 

modifications. Polyketides occur in many types of organisms, including fungi and 
mycelial bacteria, in particular, the actinomycetes. There are a wide variety of 
polyketide structures, and the class of polyketides encompasses numerous 
compounds with diverse activities. Erythromycin, FK-506, FK-520, megalomicin, 

25 narbomycin, oleandomycin, picromycin, rapamycin, spinocyn, and tylosin are 
examples of such compounds. Given the difficulty in producing polyketide 
compounds by traditional chemical methodology, and the typically low production 
of polyketides in wild-type cells, there has been considerable interest in finding 
improved or alternate means to produce polyketide compounds. See PCT 

30 publication Nos. WO 93/13663; WO 95/08548; WO 96/40968; WO 97/02358; 
and WO 98/27203; United States Patent Nos. 4,874,748; 5,063,155; 5,098,837; 
5,149,639; 5,672,491 ; and 5,712,146; YxxetaL, 1994, Biochemistry 33: 9321- 
9326; McDaniel et aL 7 1993, Science 262: 1 546-1550; and Rohr, 1995, Angew. 



WO 01/27284 PCT7US00/27433 

Chem. Int. Ed Engl. 34(8): 881-888, each of which is incorporated herein by 
reference. 

Polyketides are synthesized in nature by polyketide synthase (PKS) 
enzymes. These enzymes, which are complexes of multiple large proteins, are 
5 similar to the synthases that catalyze condensation of 2-carbon units in the 

biosynthesis of fatty acids. PKS enzymes are encoded by PKS genes that usually 
consist of three or more open reading frames (ORFs). Two major types of PKS 
enzymes are known; these differ in their composition and mode of synthesis. 
These two major types of PKS enzymes are commonly referred to as Type I or 

10 "modular" and Type II "iterative" PKS enzymes. 

Modular PKSs are responsible for producing a large number of 12-, 14-, 
and 16-membefed macrolide antibiotics including erythromycin, megalomicin, 
methymycin, narbomycin, oleandomycin, picromycin, and tylosin. Each ORF of a 
modular PKS can comprise one, two, or more "modules" of ketosynthase activity, 

15 each module of which consists of at least two (if a loading module) and more 

typically three (for the simplest extender module) or more enzymatic activities or 
"domains." These large multifunctional enzymes (>300,000 kDa) catalyze the 
biosynthesis of polyketide macrolactones through multistep pathways involving 
decarboxylative condensations between acyl thioesters followed by cycles of 

20 varying B-carbon processing activities (see O'Hagan, D. The polyketide 

metabolites; E. Horwood: New York, 1991, incorporated herein by reference). 

During the past half decade, the study of modular PKS function and 
specificity has been greatly facilitated by the plasmid-based Streptomyces 
coelicolor expression system developed with the 6-deoxyerythronolide B (6-dEB) 

25 synthase (DEBS) genes (see Kao et aL, 1994, Science, 265: 509-5 12, McDaniel et 
al. 9 1993, Science 262: 1546-1557, and U.S. Patent Nos. 5,672,491 and 
5,712,146, each of which is incorporated herein by reference). The advantages to 
this plasmid-based genetic system for DEBS are that it overcomes the tedious and 
limited techniques for manipulating the natural DEBS host organism, 

30 Saccharopolyspora erythraea, allows more facile construction of recombinant 
PKSs, and reduces the complexity of PKS analysis by providing a "clean" host 
background. This system also expedited construction of the first combinatorial 

2 
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modular polyketide library in Streptomyces (see PCT publication No. WO 
98/493 1 5, incorporated herein by reference). 

The ability to control aspects of polyketide biosynthesis, such as monomer 
selection and degree of (5-carbon processing, by genetic manipulation of PKSs has 

5 stimulated great interest in the combinatorial engineering of novel antibiotics (see 
Hutchinson, 1998, Curr. Opin. Microbiol. /: 319-329; Can-eras and Sand, 1998, 
Curr. Opin. Biotech, 9: 403-41 1; and U.S. Patent Nos. 5,712,146 and 5,672,491, 
each of which is incorporated herein by reference). This interest has resulted in the 
cloning, analysis, and manipulation by recombinant DNA technology of genes that 

10 encode PKS enzymes. The resulting technology allows one to manipulate a known 
PKS gene cluster either to produce the polyketide synthesized by that PKS at 
higher levels than occur in nature or in hosts that otherwise do not produce the 
polyketide. The technology also allows one to produce molecules that are 
structurally related to, but distinct from, the polyketides produced from known 

1 5 PKS gene clusters. 

Megalomicin is a macrolide antibiotic produced by Micromonospora 
megalomicea, a member of the Actinomycetales family of soil bacteria that 
produces many types of biologically active compounds. Megalomicin is a 
glycoside of erythromycin A, a widely used antibacterial drug with little or no 

20 antimalarial activity. Megalomicin has antibacterial properties similar to those of 
erythromycin, and in 1998, it was discovered also to have potent antiparasitic 
activity and low toxicity. The antiparasitic activity may be related to the effect 
megalomicin has on protein trafficking in eukaryotes, where it appears to inhibit 
vesicular transport between the medial and trans-Golgi, resulting in under- 

25 sialylation of proteins. Hence, megalomicin offers an exciting opportunity to 
develop a new class of antiparasitic drugs with a di fferent mechanism of action 
than the drugs currently in use and, therefore, possibly active against drug-resistant 
forms of Plasmodium falciparum. 

The number and diversity of megalomicin derivatives have been limited 

30 due to the inability to manipulate the PKS genes, which have not previously been 
available in recombinant form. Genetic systems that allow rapid engineering of the 
megalomicin biosynthetic genes would be valuable for creating novel compounds 
for pharmaceutical, agricultural, and veterinary applications. The production of 
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such compounds could be more readily accomplished if the heterologous 
expression of the megalomicin biosynthetic genes in Streptomyces coelicolor and 
S. lividans and other host cells were possible. The present invention meets these 
and other needs. 

5 

Summary of the Invention 
The present invention provides recombinant methods and materials for 
expressing PKS enzymes and polyketide modification enzymes derived in whole 
and in part from the megalomicin biosynthetic genes in recombinant host cells. 

10 The invention also provides the polyketides produced by such PKS enzymes. The 
invention provides in recombinant form all of the genes for the proteins that 
constitute the complete PKS that ultimately results, in Micromonospora 
megalomicea, in the production of megalomicin. Thus, in one embodiment, the 
invention is directed to recombinant materials comprising nucleic acids with 

15 nucleotide sequences encoding at least one domain, module, or protein encoded by 
a megalomicin PKS gene. In one preferred embodiment of the invention, the DNA 
compounds of the invention comprise a coding sequence for at least one and 
preferably two or more of the domains of the loading module and extender 
modules 1 through 6, inclusive, of the megalomicin PKS. 

20 In one embodiment, the invention provides a recombinant expression 

vector that comprises a heterologous promoter positioned to drive expression of 
one or more of the megalomicin biosynthetic genes. In a preferred embodiment, 
the promoter is derived from another PKS gene. In a related embodiment, the 
invention provides recombinant host cells comprising one or more expression 

25 vectors that produce(s) megalomicin or a megalomicin derivative or precursor! In 
a preferred embodiment, the host cell is Streptomyces lividans or S. coelicolor. 

In another embodiment, the invention provides a recombinant expression 
vector that comprises a promoter positioned to drive expression of a hybrid PKS 
comprising all or part of the megalomicin PKS and at least a part of a second PKS. 

30 In a related embodiment, the invention provides recombinant host cells 

comprising the vector that produces the hybrid PKS and its corresponding 
polyketide. In a preferred embodiment, the host cell is Streptomyces lividans or S. 
coelicolor. 
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In a related embodiment, the invention provides recombinant materials for 
the production of libraries of polyketides wherein the polyketide members of the 
library are synthesized by hybrid PKS enzymes of the invention. The resulting 
polyketides can be further modified to convert them to other useful compounds, 
5 . such as antibiotics, motilides, and antiparasitics, typically through hydroxylation 
and/or glycosylation. Modified macrolides provided by the invention that are 
useful intermediates in the preparation of antiparasitics are of particular benefit. 

In another related embodiment, the invention provides a method to prepare 
a nucleic acid that encodes a modified PKS, which method comprises using the 
10 megalomicin PKS encoding sequence as a scaffold and modifying the portions of 
the nucleotide sequence that encode enzymatic activities, either by mutagenesis, 
inactivation, deletion, insertion, or replacement. The thus modified megalomicin 
PKS encoding nucleotide sequence can then be expressed in a suitable host cell 
and the cell employed to produce a polyketide different from that produced by the 
15 megalomicin PKS. In addition, portions of the megalomicin PKS coding sequence 
can be inserted into other PKS coding sequences to modify the products thereof 

In another related embodiment, the invention is directed to a multiplicity of 
cell colonies, constituting a library of colonies, wherein each colony of the library 
contains an expression vector for the production of a modular PKS derived in 
20 whole or in part from the megalomicin PKS. Thus, at least a portion of the 

modular PKS is identical to that found in the PKS that produces megalomicin and 
is identifiable as such. The derived portion can be prepared synthetically or 
directly from DNA derived from organisms that produce megalomicin. In 
addition, the invention provides methods to screen the resulting polyketide and 
25 antibiotic libraries. 

The invention also provides novel polyketides, motilides, antibiotics, 
antiparasitics and other useful compounds derived therefrom. The compounds of 
the invention can also be used in the manufacture of another compound. In a 
preferred embodiment, the compounds of the invention are formulated in a 
30 mixture or solution for administration to an animal or human. 

In a specific embodiment, the invention provides an isolated nucleic acid 
fragment comprising a nucleotide sequence encoding a domain of megalomicin 
polyketide synthase (PKS) or a megalomicin modification enzyme. The isolated 

5 
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nucleic acid fragment can be a DNA or a RNA. Preferably, the isolated nucleic 
acid fragment is a recombinant DNA compound. 

The isolated nucleic acid fragment can comprise a single, multiple or all 
the open reading frame(s) (ORJF) of the megalomicin PKS or a megalomicin 
5 modification enzyme. Exemplary ORFs of megalomicin PKS include the ORFs of 
the megAI, megAII and megAIII genes. The isolated nucleic acid fragment can 
also encode a single, multiple, or all of the domains of the megalomicin PKS. 
Exemplary domains of the megalomicin PKS include a TE domain, a KS domain, 
an AT domain, an ACP domain, a KR domain, a DH domain and an ER domain. 

10 In a preferred embodiment, the nucleic acid fragment encodes a module of the 
megalomicin PKS. In another preferred embodiment, the nucleic acid fragment 
encodes the loading module, a thioesterase domain, and all six extender modules 
of the megalomicin PKS. 

. Megalomicin modification enzymes include those enzymes involved in the 

15 conversion of 6-dEB into a megalomicin such as the enzymes encoded by the 
megF, meg BV, megCIII, megK, megDI and megG (renamed megY) genes. 
Megalomicin modification enzymes also include those enzymes involved in the 
biosynthesis of mycarose, megosamine or desosamine, which are used as 
biosynthetic intermediates in the biosynthesis of various megalomicin species and 

20 other related polyketides. The enzymes that are involved in biosynthesis of 
mycarose, megosamine or desosamine are described in Figures 5 and 10. 

In a preferred embodiment, the invention provides an isolated nucleic acid 
fragment which hybridizes to a nucleic acid having a nucleotide sequence set forth 
in the SEQ. ID NO:l, under low, medium or high stringency. More preferably, the 

25 nucleic acid fragment comprises, consists or consists essentially of a nucleic acid 
having a nucleotide sequence set forth in the SEQ. ID NO:l. 

In another specific embodiment, the invention provides a substantially 
purified polypeptide, which is encoded by a nucleic acid fragment comprising a 
nucleotide sequence encoding a domain of megalomicin polyketide synthase 

30 (PKS) or a megalomicin modification enzyme. The polypeptide can comprise a 
single domain, multiple domains or a full-length megalomicin PKS or 
megalomicin modification enzyme. Functional fragments, analogs or derivatives 
of the megalomicin PKS or megalomicin modification enzyme polypeptides are 
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also provided. Preferably, such fragments, analogs or derivatives can be 
recognized by an antibody raised against a megalomicin PKS or megalomicin 
modification enzyme. Also preferably, such fragments, analogs or derivatives 
comprise an amino acid sequence that has at least 60% identity, more preferably at 

5 least 90% identity, to their wild type counterparts. 

In still another specific embodiment, the invention provides an antibody, or 
a fragment or derivative thereof, which immuno-specifically binds to a domain of 
megalomicin polyketide synthase (PKS) or a megalomicin modification enzyme. 
The antibody can be a monoclonal or polyclonal antibody or an antibody fragment. 

10 Preferably, the antibody is a monoclonal antibody. 

In yet another specific embodiment, the invention provides a recombinant 
DN A expression vector comprising the recombinant DNA. compound encoding at 
least a domain of the megalomicin PKS or a megalomicin modification enzyme, 
wherein said domain is operably linked to a promoter. Preferably, the 

1 5 recombinant DNA expression vector further comprises an origin of replication or a 
segment of DNA that enables chromosomal integration. 

In yet another specific embodiment, the invention provides a recombinant 
host cell comprising the above-described recombinant DNA expression vector 
encoding at least a domain of megalomicin PKS or the megalomicin modification 

20 enzyme. The recombinant host cells can be any suitable host cells including 
animal, mammalian, plant, fungal, yeast, and bacterial cells. Preferably, the 
recombinant host cells are Streptomyces cells, such as Streptomyces lividans and 
S. coelicolor cells, or ccharopolyspora cells, such as Saccharopolyspora erythraea 
cells. Also preferably, the recombinant host cells do not produce megalomicin in 

25 their untransformed, non-recombinant state. 

When the recombinant host cell contains nucleic acid encoding more than 
one megalomicin PKS or megalomicin modification enzyme, or domains thereof, 
such nucleic acid material can be located at a single genetic locus, e.g., on a single 
plasmid or at a single chromosomal locus, or at different genetic loci, e.g., on 

30 separate plasmids and/or chromosomal loci. In one example, the invention 
provides a recombinant host cell, which comprises at least two separate 
autonomously replicating recombinant DNA expression vectors, and each of said 
vectors comprises a recombinant DNA compound encoding a megalomicin PKS 
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domain or a megalomicin modification enzyme operably linked to a promoter. In 
another example, the invention provides a recombinant host cell, which comprises 
at least one autonomously replicating recombinant DNA expression vector and at 
least one modified chromosome, each of said vector(s) and each of said modified 
5 chromosome comprises a recombinant DNA compound encoding a megalomicin 
PKS domain or a megalomicin modification enzyme operably linked to a 
promoter. Preferably, the autonomously replicating recombinant DNA expression 
vector and/or the modified chromosome further comprises distinct selectable 
markers. 

10 In a preferred embodiment, the cell comprises three different vectors, one 

of which is integrated into the chromosome and two of which are autonomously 
replicating, and each of the vectors comprises a meg PKS gene. Optionally, one or 
more of the meg PKS genes contains one or more domain alterations, such as a 
deletion or substitution of a meg PKS domain with a domain from another PKS. 

15 In yet another specific embodiment, the invention provides a hybrid PKS, 

which is produced from a recombinant gene that comprises at least a portion of a 
megalomicin PKS gene and at least a portion of a second PKS gene for a 
polyketide other than megalomicin. For example, and without limitation, the 
second PKS gene can be a narbonolide PKS gene, an oleandolide PKS gene, or a 

20 rapamycin PKS gene. In one embodiment, the hybrid PKS is composed of a 

loading module and six extender modules, wherein at least one domain of any one 
of extender modules 1 through 6, inclusive, is a domain of an extender module of 
megalomicin PKS. In another preferred embodiment, the hybrid PKS comprises a 
megalomicin PKS that has a non-functional KS domain in module 1 . 

25 In yet another specific embodiment, the invention provides a method of 

producing a polyketide, which method comprises growing the recombinant host 
cell comprising a recombinant DNA expression vector encoding at least a domain 
of the megalomicin PKS or a megalomicin modification enzyme under conditions 
whereby the megalomicin PKS domain or the megalomicin modification enzyme 

30 comprised by the recombinant expression vector is produced and the polyketide is 
synthesized by the cell, and recovering the synthesized polyketide. Preferably, the 
recombinant host cell comprises a recombinant expression vector that encodes at 
least a portion of a megAI, megAII, or megA III gene. 

8 
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These and other embodiments of the invention are described in more detail 
in the following description, the examples, and claims set forth below. 

Brief Description of the Figures 
5 Figure I shows restriction site and function maps of the insert DN A in 

cosmids pKOS079-138B, pKOS079-93D, pKOS079-93A, and pKOS079-124B of 
the invention. Various restriction sites (Xhol, Bglll, Nsil) are also shown. The 
location of the megalomicin biosynthetic genes is shown below the solid lines 
indicating the cosmid inserts. The genes are shown as arrows pointing in the 
10 direction of transcription. The approximate size (in kilobase (kb) pairs) of the gene 
cluster is indicated in 5000 bp (i.e., 5K> 10K, and the like.) increments on a solid 
bar beneath the arrows indicating the genes. 

Figure 2 shows a more detailed map of the megalomicin biosynthetic gene 
cluster. The various open reading frames are shown as arrows pointing in the 
1 5 direction of transcription. A line indicates the size in base pairs (in 1000 bp 

increments) of the gene cluster. The various domains of the megalomicin PKS are 
also shown. Other genes of the megalomicin biosynthetic gene cluster not shown 
in this Figure are located in the insert DNA of cosmids pKOSOl 38B and 
pKOS0124B. 

20 Figure 3 shows the structures of the megalomicins, azithromycin and 

erythromycin A. 

Figure 4 shows the modules and domains of DEBS and the megalomicin 

PKS. 

Figure 5 shows the compounds and reactions in the erythromycin 
25 biosynthetic pathway and also for megalomicin biosynthesis. Genes that produce 
the various enzymes that catalyze each of the steps in the biosynthetic pathway are 
indicated. 

Figure 6 shows the biosynthetic pathway for the formation of desosamine, 
rhodosamine, and mycarose, as well as the genes that produce the various enzymes 
30 that catalyze each of the steps in the biosynthetic pathway. 

Figure 7 depicts nucleotide and amino acid sequence of Micromonospora 
megalomicea megalomicin biosynthetic genes (GenBank Accession No. 
AF263245, incorporated herein by reference). 

9 
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Figure 8 depicts the biosynthesis of the erythromycins and megalomicins 
and the enzymes that mediate the biosynthesis of each. 

Figure 9 depicts the cloned megalomicin biosynthetic gene cluster and 
certain cosmids of the invention that comprise portions of the cluster. 
5 Figure 1 0 depicts the biosynthesis of megosamine, mycarose, and 

desosamine. 



Detailed Description of the Invention 
The present invention provides useful compounds and methods for 
1 0 producing polyketides in recombinant host cells. As used herein, the term 
recombinant refers to a compound or composition produced by human 
intervention. The invention provides recombinant DNA compounds encoding all 
or a portion of the megalomicin biosynthetic genes. The invention provides 
recombinant expression vectors useful in producing the megalomicin PKS and 

15 hybrid PKSs composed of a portion of the megalomicin PKS in recombinant host 
cells. The invention also provides the polyketides produced by the recombinant 
PKS and polyketide modification enzymes. 

To appreciate the many and diverse benefits and applications of the 
invention, the description of the invention below is organized as follows. In 

20 Section I, common definitions used throughout this application are provided. In 
Section II, structural and functional characteristics of megalomicin are described. 
In Section III, the recombinant megalomicin biosynthetic genes and other 
recombinant nucleic acids provided by the invention are described. In Section TV, 
polypeptides and proteins encoded by the megalomicin biosynthetic genes and 

25 antibodies that specifically bind to such polypeptides and proteins provided by the 
invention are described. In Section V, methods for heterologous expression of the 
megalomicin biosynthetic genes provided by the invention are described. In 
Section VI, the hybrid PKS genes provided by the invention are described. In 
Section VII- host cells containing multiple megalomicin biosynthetic genes and 

30 nucleic acid fragments on separate express vectors provided by the invention are 
described. In Section VIII, the polyketide compounds provided by the invention 
and pharmaceutical compositions of those compounds are described. The detailed 
description is followed by working examples illustrating the invention. 
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Unless defined otherwise, all technical and scientific terms used herein 
have the same meaning as is commonly understood by one of ordinary skill in the 
art to which this invention belongs. All patents, applications, published 
applications and other publications and sequences from GenBank and other data 
5 bases referred to herein are incorporated by reference in their entirety. 

Section I. Diefinitions 

As used herein, domain refers to a portion of a molecule, e.g., proteins or 
nucleic acids, that is structurally and/or functionally distinct from another portion 
10 of the molecule. 

As used herein, antibody includes antibody fragments, such as Fab 
fragments, which are composed of a light chain and the variable region of a heavy 
chain. 

As used herein, biological activity refers to the in vivo activities of a 
1 5 compound or physiological responses that result upon in vivo administration of a 

compound, composition or other mixture. Biological activity, thus, encompasses 

therapeutic effects and pharmaceutical activity of such compounds, compositions 

and mixtures. Biological activities may be observed in in vitro systems designed 

to test or use such activities. 
20 As used herein, a combination refers to any association between two or 

among more items. 

As used herein, a composition refers to any mixture. It may be a solution, 

a suspension, liquid, powder, a paste, aqueous, non-aqueous or any combination 

thereof. 

25 As used herein, derivative or analog of a molecule refers to a portion 

derived from or a modified version of the molecule. 

As used herein, operably linked, operatively linked or operationally 
associated refers to the functional relationship of DNA with regulatory and 
effector sequences of nucleotides, such as promoters, enhancers, transcriptional 

30 and translational stop sites, and other signal sequences. For example, operative 
linkage of DNA to a promoter refers to the physical and functional relationship 
between the DNA and the promoter such that the transcription of such DNA is 
initiated from the promoter by an RNA polymerase that specifically recognizes, 

II 



C127<:84A2 l_> 



WO 01/27284 PCT/US00/27433 

binds to and transcribes the DNA. To optimize expression and/or in vitro 
transcription, it may be helpful to remove, add or alter 5' untranslated portions of 
the clones to eliminate extra, potentially inappropriate alternative translation 
initiation (/.e., start) codons or other sequences that may interfere with or reduce 
5 expression, either at the level of transcription or translation. Alternatively, 

consensus ribosome binding sites (see, e.g., Kozak, J. Biol Chem., 266:19867- 
19870 (1991)) can be inserted immediately 5* of the start codon and may enhance 
expression- The desirability of (or need for) such modification may be empirically 
determined. 

10 As used herein, pharmaceutical^ acceptable salts, esters or other 

derivatives of the conjugates include any salts, esters or derivatives that may be 
readily prepared by those of skill in this art using known methods for such 
derivatization and that produce compounds that may be administered to animals or 
humans without substantial toxic effects and that either are pharmaceutical^ 

15 active or are prodrugs. 

As used herein, a promoter region or promoter element refers to a segment 
of DNA or RNA that controls transcription of the DNA or RNA to which it is 
operatively linked. The promoter region includes specific sequences that are 
sufficient for RNA polymerase recognition, binding and transcription initiation. 

20 This portion of the promoter region is referred to as the promoter. In addition, the 
promoter region includes sequences that modulate this recognition, binding and 
transcription initiation activity of RNA polymerase. These sequences may be en- 
acting or may be responsive to trans acting factors. Promoters, depending upon 
the nature of the regulation, may be constitutive or regulated. 

25 As used herein: stringency of hybridization in determining percentage 

mismatch is as follows: (l) high stringency: 0.1 x SSPE, 0.1% SDS, 65°C; (2) 
medium stringency: 0.2 x SSPE, 0.1% SDS, 50°C; and (3) low stringency: 1.0 x 
SSPE, 0.1% SDS, 50°C. Equivalent stringencies may be achieved using alternative 
buffers, salts and temperatures. 
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The term substantially identical or homologous or similar varies with the 
context as understood by those skilled in the relevant art and generally means at 
least 70%, preferably means at least 80%, more preferably at least 90%, and most 
preferably at least 95% identity. 
5 As used herein, substantially identical to a product means sufficiently 

similar so that the property of interest is sufficiently unchanged so that the 
substantially identical product can be used in place of the product. 

As used herein, isolated means that a substance is either present in a 
preparation at a concentration higher than that substance is found in nature or in its 
1 0 naturally occurring state or that the substance is present in a preparation that 

contains other materials with which the substance is not associated with in nature. 
As an example of the latter, an isolated meg PKS protein includes a meg PKS 
protein expressed in a Streptomyces coelicolor or S. lividans host cell. 

As used herein, substantially pure means sufficiently homogeneous to 
15 appear free of readily detectable impurities as determined by standard methods of 
analysis, such as thin layer chromatography (TLC), gel electrophoresis and high 
performance liquid chromatography (HPLC), used by those of skill in the art to 
assess such purity, or sufficiently pure such that further purification would not 
. detectably alter the physical and chemical properties, such as enzymatic and 
20 biological activities, of the substance. Methods for purification of the compounds 
to produce substantially chemically pure compounds are known to those of skill in 
the art. A substantially chemically pure compound may, however, be a mixture of 
stereoisomers or isomers. In such instances, further purification might increase 
the speci fic activity of the compound. 
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As used herein, vector or plasmid refers to discrete elements that are used 
to introduce heterologous DNA into cells for either expression or replication 
thereof Selection and use of such vehicles are well known within the skill of the 
artisan. An expression vector includes vectors capable of expressing DNAs that 
5 are operatively linked with regulatory sequences, such as promoter regions, that 
are capable of effecting expression of such DNA fragments. Thus, an expression 
vector refers to a recombinant DNA or RN A construct, such as a plasmid, a phage, 
recombinant virus or other vector that, upon introduction into an appropriate host 
cell, results in expression of the cloned DNA. Appropriate expression vectors are 
10 well known to those of skill in the art and include those that are replicable in 

eukaryotic cells and/or prokaryotic cells and those that remain episomal or those 
which integrate into the host cell genome. 

Section II. Megalomicins 

15 The megalomicins were discovered in 1969 at Schering Corp. as 

antibacterial agents produced by Micromonospora megalomicea (see Weinstein et 
at., 1969, J. Antibiotics 22: 253-258, and U.S. Patent No. 3,632,750, both of 
which are incorporated herein by reference). Although the initial structural 
assignment was in error, a thorough reassessment of NMR data coupled with an 

20 X-ray crystal structure of a megalomicin A derivative (see Nakagawa and Omura, 
"Structure and Stereochemistry of Macrolides" in Macrolide Antibiotics (S. 
Omura, ed.), Academic Press, NY, 1984, incorporated herein by reference) 
established the structures shown in Figure 3. The megalomicins are 6-0- 
glycosides of erythromycin C with acetyl or propionyl groups esterified at the 3"' 

25 or 4"' hydroxyls of the mycarose sugar at the C-3-position. The C-6 sugar has 

been named "megosamine," although it had been identified 5 to 10 years earlier as 
L-rhodosamine or 7V-dimethyldaunosamine, deoxyamino sugars commonly present 
in the anthracycline antitumor drugs. The antibacterial potency, spectrum of 
activity, and toxicity (LD50 acute, 7-7.5 g/kg s.c. or oral; subacute, >500 mg/kg) of 

30 the megalomicins is similar to that of erythromycin A. 

The megalomicins have two modes of biological activity. As antibacterials, 
they act like the erythromycins, which inhibit protein synthesis at the translocation 
step by selective binding to the bacterial SOS ribosomal RNA. They also affect 

14 
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protein trafficking in eukaryotic cells (see Bonay et al. 9 1996, J. Biol. Chem. 
277:3719-3726, incorporated herein by reference). Although the mechanism of 
action is not entirely clear, it appears to involve inhibition of vesicular transport 
between the medial and trans Golgi, resulting in under-sialylation of proteins. The 

5 megalomicins also strongly inhibit the ATP-dependent acidification of lysosomes 
in vivo (see Bonay et al., 1997, J. Cell. Sci. 110: 1 839-1 849, incorporated herein by 
reference) and cause an anomalous glycosylation of viral proteins, which may be 
responsible for their antiviral activity against herpes (Tox 5 o, 70-100 (iM; see 
Alarcon et all, 1984, Antivir. Res. 4:231-243, and Alarcon et aL y 1988, FEBS Lett, 

1 0 2 31 :207-2 1 1 , both of which are incorporated herein by reference). 

Strikingly, the megalomicins are potent antiparasitic agents, showing an 
lC$o of 1 |ig/ml in blocking intracellular replication of Plasmodium falciparum 
infected erythrocytes (see Bonay et al. 9 1 998, Antimicrob, Agents Chemother. 
42:2668-2673, incorporated herein by reference). The megalomicins are effective 

1 5 against Trypanosoma cruzi and T. brucei (IC50, 0.2-2 ng/ml) plus Leishmania 
donovani and L. major promastigotes (IC 50 , 3 and 8 ^ig/ml, respectively). 
Megalomicin is also active against the intracellular replicative, amastigote form of 
T cruzi 9 completely preventing its replication in infected murine LLC/MK2 
macrophages at a dose of 5 jag/ml. Importantly, the effective drug concentration is 

20 500-fold less than the acute LD50 in mammals, and there is no toxicity to BALB/c 
mice at doses (50 mg/kg) that are completely curative for T. brucei infections. 
Because the erythromycins do not have such activity, although azithromycin 
(Figure 3) has been reported to be an effective acute and prophylactic treatment for 
malaria caused by P. vivax and P. falciparum (see Taylor et al., 1999, Clin. Infect. 

25 Dis. 28:74-81, incorporated herein by reference), the antiparasitic action of the 
megalomicins is unique and probably related to the presence of the deoxyamino 
sugar megosamine at C-6 (Figure 3). Consequently, the megalomicins could be 
developed into potent antimalarial drugs with a high therapeutic index and be 
active against P. falciparum and other species that are resistant to currently used 

30 classes of antimalarials. They also could lead to potent antiparasitic agents against 
leishmaniasis, trypanosomiasis, and Chagas' disease. In view of the widespread 
use of the erythromycins and their good oral availability plus the low mammalian 
toxicity of macrolides in general, the megalomicins could be used prophylactically 
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to combat malaria, and as fermentation products, the megalomicins should be 
relatively inexpensive to produce. . 

The megalomicins belong to the polyketide class of natural products whose 
members have diverse structural and pharmacological properties (see Monaghan 
5 and Tkacz, 1990, Annu, Rev, Microbiol. 44: 271, incorporated herein by 

reference). The megalomicins are assembled by polyketide synthases through 
successive condensations of activated coenzyme-A thioester monomers derived 
from small organic acids such as acetate, propionate, and butyrate. Active sites 
required for condensation include an acyltransferase (AT), acyl carrier protein 

10 (ACP), and beta-ketoacylsynthase (KS). Each condensation cycle results in a B- 
keto group that undergoes all, some, or none of a series of processing activities. 
Active sites that perform these reactions include a ketoredlictase (KR), 
dehydratase (DH), and enoylreductase (ER). Thus, the absence of any beta-keto 
processing domain results in the presence of a ketone, a KR alone gives rise to a 

1 5 hydroxyl, a KR and DH result in an alkene, while a KR, DH, and ER combination 
leads to complete reduction to an alkane. After assembly of the polyketide chain, 
the molecule typically undergoes cyclization(s) and post-PKS modification (e.g. 
glycosylation, oxidation, acylation) to achieve the final active compound. 

Macrolides such as erythromycin and megalomicin are synthesized by 

20 modular PKSs (see Cane et aL, 1998, Science 282: 63, incorporated herein by 
reference). For illustrative purposes, the PKS that produces the erythromycin 
polyketide (6-deoxyerythronolide B synthase or DEBS; see U.S. Patent No. 
5,824,513, incorporated herein by reference) is shown in Figure 4. DEBS is the 
most characterized and extensively used modular PKS system. DEBS is 

25 particularly relevant to the present invention in that it synthesizes the same 

polyketide, 6-deoxyerythronolide B (6-dEB), synthesized by the megalomicin 
PKS. In modular PKS enzymes such as DEBS and the megalomicin PKS, the 
enzymatic steps for each round of condensation and reduction are encoded within 
a single "module" of the polypeptide (i.e., one distinct module for every 

30 condensation cycle). DEBS consists of a loading module and 6 extender modules 
and a chain terminating thioesterase (TE) domain within three extremely large 
polypeptides encoded by three open reading frames (ORFs, designated eryAI, 
eryAII, and eryAIII). 
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Each of the three polypeptide subunits of DEBS (DEBSI, DEBSU, and 
DEBSIII) contains 2 extender modules, DEBSI additionally contains the loading 
module. Collectively, these proteins catalyze the condensation and appropriate 
reduction of 1 propionyl CoA starter unit and 6 methylmaionyl CoA extender 

5 units. Modules 1, 2, 5, and 6 contain KR domains; module 4 contains a complete 
set, KR/DH/ER, of reductive and dehydratase domains; and module 3 contains no 
. functional reductive domain. Following the condensation and appropriate 
dehydration and reduction reactions, the enzyme bound intermediate is lactonized 
by the TE at the end of extender module 6 to form 6-dEB. 

10 More particularly, the loading module of DEBS consists of two domains, 

an acyl -transferase (AT) domain and an acyl carrier protein (ACP) domain. In 
other PKS enzymes, the loading module is not composed of an AT and an ACP 
but instead utilizes an inactivated KS, an AT, and an ACP. This inactivated KS is 
in most instances called KS Q , where the superscript letter is the abbreviation for 

1 5 the amino acid, glutamine, that is present instead of the active site cysteine 
required for activity. The AT domain of the loading module recognizes a 
particular acyl-CoA (propionyl for DEBS, which can also accept acetyl) and 
transfers it as a thiol ester to the ACP of the loading module. Concurrently, the AT 
on each of the extender modules recognizes a particular extender-CoA 

20 (methylmaionyl for DEBS) and transfers it to the ACP of that module to form a 

thioester. Once the PKS is primed with acyl- and malonyl-ACPs, the acyl group of 
the loading module migrates to form a thiol ester (trans-esterification) at the KS of 
the first extender module; at this stage, extender module 1 possesses an acyl-KS 
and a methylmaionyl ACP. The acyl group derived from the loading module is 

25 then covalently attached to the alpha-carbon of the malonyl group to form a 

carbon-carbon bond, driven by concomitant decarboxylation, and generating a new 
acyl-ACP that has a backbone two carbons longer than the loading unit 
(elongation or extension). The growing polyketide chain is transferred from the 
ACP to the KS of the next module, and the process continues. 

30 . The polyketide chain, growing by two carbons each module, is sequentially 

passed as a covalently bound thiol ester from module to module, in an assembly 
line-like process. The carbon chain produced by this process alone would possess 
a ketone at every other carbon atom, producing a polyketone, from which the 
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name polyketide arises. Commonly, however, the beta keto group of each two- 
carbon unit is modified just after it has been added to the growing polyketide 
chain but before it is transferred to the next module by either a KR, a KR plus a 
DH, or a KR, a DH,.and an ER. As noted above, modules may contain additional 
5 enzymatic activities as well. 

Once a polyketide chain traverses the final extender module of a PKS, it 
encounters the releasing domain or thioesterase found at the carboxyl end of most 
PKSs. Here, the polyketide is cleaved from the enzyme and cyclyzed. The 
resulting polyketide can be modified further by tailoring or modification enzymes; 

1 0 these enzymes add carbohydrate groups or methyl groups, or make other 

modifications, i.e., oxidation or reduction, on the polyketide core molecule. For 
example, the final steps in conversion of 6-dEB to erythromycin A include the 
actions of a number of modification enzymes, such as: C-6 hydroxylation, 
attachment of mycarose and desosamine sugars, C-12 hydroxylation (which 

15 produces erythromycin C), and conversion of mycarose to cladinose via O- 
methylation, as shown in Figure 5. 

With this overview of PKS and post-PKS modification enzymes, one can 
better appreciate the recombinant megalomicin biosynthetic genes provided by the 
invention and their function, as described in the following Section. 

20 

Section III: The Megalomicin Biosynthetic Genes and Nucleic Acid Fragments 

The megalomicin PKS was isolated and cloned by the following 
procedure; Genomic DNA was isolated from a megalomicin producing strain of 
Micromonospora megalomicea subsp. nigra (ATCC 27598), partially digested 

25 with a restriction enzyme, and cloned into a commercially available cosmid vector 
to produce a genomic library. This library was then probed with probe generated 
from the erythromycin biosynthetic genes as well as from cosmids identified as 
containing sequences homologous to erythromycin biosynthetic genes. This 
probing identi fied a set of cosmids, which were analyzed by DNA sequence 

30 analysis and restriction enzyme digestion, which revealed that the desired DNA 
had been isolated and that the entire PKS gene cluster was contained in 
overlapping segments on four of the cosmids identified. Figure 1 shows the 
cosmids, and the portions of the megalomicin biosynthetic gene cluster in the 
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insert DNA of the cosmids. Figure I shows that the complete megalomicin 
biosynthetic gene cluster is contained within the insert DNA of cosmids 
pKOS079-138B, pKOS079-124B, pKOS079-93D, and pKOS079-93A. Each of 
these cosmids has been deposited with the American Type Culture Collection in 
5 accordance with the terms of the Budapest Treaty (cosmid pKOS079-138B is 

available under accession no. ATCC ; cosmid pKOS079-124B is available 

under accession no. ATCC ; cosmid pKOS079-93D is available under 

accession no. ATCC ; and cosmid pKOS079-93A is available under 

accession no. ATCC ). Various additional reagents of the invention can be 

10 isolated from these cosmids. DNA sequence analysis was also performed on the 
various subclones of the invention, as described herein. Further analysis of these 
cosmids and subclones prepared from the cosmids facilitated the identification of 
the location of various megalomicin biosynthetic genes, including the ORFs 
encoding the PKS, modules encoded by those ORFs, and coding sequences for 
15 megalomicin modification enzymes. The location of these genes and modules is 
shown on Figure 2. 

Those of skill in the art will recognize that, due to the degenerate nature of 
the genetic code, a variety of DNA compounds differing in their nucleotide 
sequences can be used to encode a given amino acid sequence of the invention. 
20 The native DNA sequence encoding the megalomicin PKS and other biosynthetic 
enzymes and other biosynthetic enzymes of Micromonospora megalomicea is 
shown herein merely to illustrate a preferred embodiment of the invention, and the 
invention includes DNA compounds of any sequence that encode the amino acid 
sequences of the polypeptides and proteins of the invention. In similar fashion, a 
25 polypeptide can typically tolerate one or more amino acid substitutions, deletions, 
and insertions in its amino acid sequence without loss or significant loss of a 
desired activity. The present invention includes such polypeptides with alternate 
amino acid sequences, and the amino acid sequences encoded by the DNA 
sequences shown herein merely illustrate preferred embodiments of the invention. 
30 The recombinant nucleic acids, proteins, and peptides of the invention are 

many and diverse. To facilitate an understanding of the invention and the diverse 
compounds and methods provided thereby, the following description of the 
various regions of the megalomicin PKS and the megalomicin modification 

19 



WO 01/27284 PCT/US00/27433 

enzymes and corresponding coding sequences is provided; To facilitate description 
of the invention, reference to a PKS, protein, module, or domain herein can also 
refer to DNA compounds comprising coding sequences therefor and vice versa. 
Also, unless otherwise indicated, reference to a heterologous PKS refers to a PKS 
5 or DNA compounds comprising coding sequences therefor from an organism 
other than Micromonospora megalomicea. In addition, reference to a PKS or its 
coding sequence includes reference to any portion thereof. 

Thus, the invention provides DNA molecules in isolated (i.e., not pure, but 
existing in a preparation in an abundance and/or concentration not found in nature) 

10 and purified (i.e., substantially free of contaminating materials or substantially free 
of materials with which the corresponding DNA would be found in nature) form. 
The DNA molecules of the invention comprise one or more sequences that encode 
one or more domains (or fragments of such domains) of one or more modules in 
one or more of the ORFs of the megalomicin PKS and sequences that encode 

15 megalomicin modification enzymes from the megalomicin biosynthetic gene 

cluster. Examples of PKS domains include the KS, AT, DH, KR, ER, ACP, and. 
TE domains of at least one of the 6 extender modules and loading module of the 
three proteins encoded by the three ORFs of the megalomicin PKS gene cluster. 
Examples of megalomicin modification enzymes include those that synthesize the 

20 mycarose, desosaniine, and megosamine moieties, those that transfer those sugar 
moieties to the polyketide 6-dEB, those that hydroxylate the polyketide at C-6 and 
C-12, and those that acylate the sugar moieties. 

In an especially preferred embodiment, the DNA molecule is a 
recombinant DNA expression vector or plasmid, as described in more detail in the 

25 following Section. Generally, such vectors can either replicate in the cytoplasm of 
the host cell or integrate into the chromosomal DNA of the host cell. In either 
case, the vector can be a stable vector (i.e., the vector remains present over many 
cell divisions, even if only with selective pressure) or a transient vector (i.e., the 
vector is gradually lost by host cells with increasing numbers of cell divisions). 

30 The megalomicin PKS gene cluster comprises three ORFs (megAI, megAII, 

and megAIII). Each ORF encodes two extender modules of the PKS; the first ORF 
also encodes the loading module. Each extender module is composed of at least a 
KS, an AT, and an ACP domain. The locations of the various encoding regions of 
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these ORFs are shown in Figure 2 and described with reference to the sequence 
information below. The megalomicin PKS produces the polyketide known as 6- 
dEB, shown in Figure 4. In megalomicin-producing organisms, 6-dEB is 
converted to erythromycin C by a set of modification enzymes. Thus, 6-dEB is 

5 converted to erythronolide B by the megF gene product (a homolog of the eryF 
gene product), then to 3-alpha-mycarosyI-erythronolide B by the megBV gene 
product (a homolog of the eryBV gene product), then to erythromycin D by the 
megClIl gene product (a homolog of the eryCIII gene product, then to 
erythromycin C by the megK gene product (a homolog of the eryK gene product). 

1 0 In addition to these modification enzymes, such megalomicin-producing 

organisms also contain the modification enzymes necessary for the biosynthesis of 
the desosamine and mycarose moieties that are similarly utilized in erythromycin 
biosynthesis, as shown in Figure 5. Megalomicin A contains the complete 
erythromycin C structure, and its biosynthesis additionally involves the formation 

1 5 of L-megosamine (L-rhodosamine) and its attachment to the C-6 hydroxyl 
(Figures 3 and 5, inset), followed by acylation of the C-3 ,,s and(or) C-4"' 
hydroxyls as the terminal steps. L-megosamine is the same as 7V-dimethyl-L- 
daunosamine; the daunosamine genes have been characterized from Streptomyces 
peucetius (see Colombo and Hutchinson, J. IndusL Microbiol Biotechnol., in 

20 press; Otten et al., 1996, J Bacteriol 178:7316-7321, and references cited therein). 
Some of the rhodosamine genes also have been cloned and partially characterized 
from another anthracycline producing Streptomyces sp. (see Torkkell et aL, 1 997, 
Moi Gen. Genet. 25<5(2):203-209). Because the timing of the glycosylation with 
TE>P-megosamine in relation to the addition of mycarose and desosamine to 

25 erythronolide B, plus the C-12 hydroxylation, is unknown, the pathway could 
involve a different order of glycosylation and C-12 hydroxylation steps than the 
one shown in Figure 5. Regardless, the megalomicin biosynthetic gene cluster 
contains the genes to make L-rhodosamine and attach it to the correct macrolide 
substrate. 

30 The biosynthetic pathways to make the glycosides desosamine, mycarose, 

and megosamine are shown in Figure 6. The present invention provides the genes 
for each biosynthetic pathway shown in this Figure, and these recombinant genetic 
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pathways can be used alone or in any combination to confer the pathway to a 
heterologous host. 

The megalomicin PKS locus is similar to the eryA locus in size and 
organization. Most of the deoxysugar biosynthesis genes are homologs of the eryB 
5 mycarose and eryC desosamine biosynthesis and glycosyl attachment genes from 
Saccharopolyspora eryihraea (see Summers et aL, 1 997, Microbiol, J 43:325 1 - 
3262; Haydockera/., 1991, A/oA Gen. Genet. 230:120-128; Gaisser et al y 1997, 
Mol Gen Genet, 256:239-251 ; Gaisser et aL, 1998, Mol Gen Genet. 257:78-88, 
incorporated herein by reference) or the picC homologs from the picromycin and 

10 narbomycin producer (see PCT patent publication No. 99/61599 and Xue et al., 
1998, Proc.NaL Acad. Set USA 95, 121 1 1-121 16, incorporated herein by 
reference). The TDP-megosamine biosynthesis genes are homologs of the dnm 
genes (see Figure 5) and the pikromycin N-dimethyltransferase gene or its 
homologs reported in a cluster of L-rhodosamine biosynthesis genes. The putative 

1 5 TDP-megosamine glycosyltransferase gene product (geneX in Figure 5) closely 
resembles the deduced products of the eryBV , eryCJII, dnmS, and pikromycin 
desVIl genes, even though it recognizes different substrates than the products of 
each of these genes. 

The following Table 1 shows the location of the genes in the 
20 Micromonospora megalomicea megalomicin biosynthetic pathway in the DNA 
sequence set forth in SEQ ID NO:l (see also Figure 7; note some gene 
designations maybe different in Figure 7). 



Table 1 . Megalomicin Biosynthetic Gene Cluster 
25 Micromonospora megalomicea subsp. nigra (ATCC27598) 

Location Description 

1 - 245 1 sequence from cosmid pKOS079- I38B 

complement^ .. 144) megBVI (or megT), TDP-4-keto-6-deoxyglucose- 
30 2,3-dehydratase 

928 "20 61 megDVl TDP-4-keto-6-deoxygiucose 3,4-isomerase 

2072. .3382 megDI, TDP-megosaminyl transferase (eryCIII 
homolog) 

2452. .40397 sequence of cosmid pKOS079-93D 

35 3462. .4634 megG(or megY), mycarosyl acyl transferase 

465 1 ..5775 megDII, deoxysugar transaminase (eryCJ, DnrJ 

homolog) 
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5822..659S 

d imethyl transferase 

6592..7197 

5 

7220..8206 
dnmV 

complement(8228..9220) 
10 hexose 2,3-reductase 

complement(9226.. 1 0479) 
complement(l0483..1 1424) 

12181. .22821 

12181 ..13791 
15 12505.. 13470 

13576..13791 

13849.. 18207 

13849..15126 

15427.. 16476 
20 17155..17694 

17947.. 18207 

1 8268..2257S 

18268.. 19548 

19876..20910 
25 21517..22053 

2231 8..22575 

22867.33555 

22957..27258 

22957-24237 
30 24544. .25581 

26230..26733 

26998..27258 

27313..33312 

27393. .28590 
35 28897.-2993 1 

29953. .30477 

31396-32244 

32257-32799 

33052..33312 
40 33666..43271 

33780..38120 

33780-35027 

35385-36419 

37068-37604 
45 37860-38120 

38187-42425 

38187..39470 

39795-40811 

40398-46641 



megDIII, TDP-daunosaminyl-N,N- 
(eryCVl homolog) 

megDIV, TDP-4-keto-6-deoxyglucose 3,5-epimerase 

(eryB VII, dnmU homolog) 

'megDV, TDP-hexose 4-ketoreductase (eryBIV, 

homolog) 

megBlI-X or megDV II, TDP-4-keto-L-6-deoxy- 

megBV, TDP-mycarosyl transferase 
megBIV, TDP-hexose 4-ketoreductase 
megAI 

Loading Module (L) 

AT-L 

ACP-L 

Extender Module 1(1)- 

KS1 

ATI 

KR1 

ACPI 

Extender Module 2 (2) 

KS2 

AT2 

KR2 

ACP2 

megAII, 

Extender Module 3 (3) 

KS3 

AT3 

KR3 (inactive) 
ACP3 

Extender Module 4 (4) 

KS4 

AT4 

DH4 

ER4 

KR4 

ACP4 

megAIH 

Extender Module 5 (5) 

KS5 

ATS 

KR5 

ACP5 

Extender Module 6 (6) 

KS6 

AT6 

sequences from cosmid pKOS079-93A 
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41406..41936 KR6 

421 68..4242S ACP6 

42585..43271 TE 

43268..44344 megC/1, TDP-4-keto-6-deoxygIucose 3,4-isomerase 

5 443S5..45623 megCIIl, TDP-desosaminyl transferase 

45620..46591 megBII, TDP-4-keto-6-deoxy-L-glucose 2,3 

dehydratase 

complement(46660.. 47403) megH, TEH 

complement(4741 1.. 47980) megF, C-6 hydroxylase 



In a specific embodiment, the invention provides an isolated nucleic acid 
fragment comprising a nucleotide sequence encoding a domain of the 
megafomicin polyketide synthase or a megalomicin modification enzyme. The 
isolated nucleic acid fragment can be a DNA or a RNA. Preferably, the isolated 
1 5 nucleic acid fragment is a recombinant DNA compound. A nucleotide sequence 
that is complementary to the nucleotide sequence encoding a domain of 
megalomicin PKS or a megalomicin modification enzyme is also provided. 

The isolated nucleic acid fragment can comprise a single, multiple or all 
the open reading frame(s) (ORE) of the megalomicin PKS or the megalomicin 
20 modification enzyme. Exemplary ORPs of megalomicin PKS include the ORFs of 
the megAI, megAIl and rnegAIU genes. The isolated nucleic acids of the invention 
also include nucleic acids that encode one or more domains and one or more 
modules of the megalomicin PKS. Exemplary domains of the megalomicin PKS 
include a TE domain, a KS domain, an AT domain, an ACP domain, a KR 
25 domain, a DH domain and an ER domain. In a preferred embodiment, the nucleic 
acid comprises the coding sequence for a loading module, a thioesterase domain, 
and all six extender modules of the megalomicin PKS. 

Megalomicin modification enzymes include those enzymes involved in the 
conversion of 6-DEB into a megalomicin such as the enzymes encoded by megF y 
30 meg BV, megCIII, megK, megDI and megG (or megY). Megalomicin modification 
enzymes also include those enzymes involved in the biosynthesis of mycarose, 
megosamine or desosamine, which are used as biosynthetic intermediates in the 
biosynthesis of various megalomicin species and other related polyketides. The 
enzymes that are involved in biosynthesis of mycarose, megosamine or 
35 desosamine are described in Figures 5 and 10. The megalomicin PKS and 

megalomicin modification enzymes are collectively referred to as megalomicin 
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biosynthetic enzymes; the genes encoding such enzymes are collectively referred 
to as megalomicin biosynthetic genes; and nucleic acids that comprise a portion of 
or entire megalomicin biosynthetic genes are collectively referred to as 
megalomicin biosynthetic nucleic acid(s). 

5 In specific embodiments, the megalomicin biosynthetic nucleic acids 

comprise the sequence of SEQ ID NO:l, or the coding regions thereof, or 
nucleotide sequences encoding, in whole or in part, a megalomicin biosynthetic 
enzyme protein. The isolated nucleic acids typically consists of at least 25 
(continuous) nucleotides, 50 nucleotides, 100 nucleotides, 150 nucleotides, or 200 

10 nucleotides of megalomicin biosynthetic nucleic acid sequence, or a full-length 
megalomicin biosynthetic coding sequence. In another embodiment, the nucleic 
acids are smaller than 35, 200, or 500 nucleotides in length. Nucleic acids can be 
single or double. stranded. Nucleic acids that hybridize to or are complementary to 
the foregoing sequences, in particular the inverse complement to nucleic acids that 

1 5 hybridize to the foregoing sequences (Le. , the inverse complement of a nucleic 
acid strand has the complementary sequence running in reverse orientation to the 
strand so that the inverse complement would hybridize without mismatches to the 
nucleic acid strand) are also provided. In specific aspects, nucleic acids are 
provided which comprise a sequence complementary to (specifically are the 

20 inverse complement of) at least 10, 25, 50, 100, or 200 nucleotides or the entire 
coding region of a megalomicin biosynthetic gene. 

The megalomicin biosynthetic nucleic acids provided herein include those 
with nucleotide sequences encoding substantially the same amino acid sequences 
as found in native megalomicin biosynthetic enzyme proteins, and those encoding 

25 amino acid sequences with functionally equivalent amino acids, as well as 

megalomicin biosynthetic enzyme derivatives or analogs as described in Section 
IV. 

Some regions within the megalomicin PKS genes are highly homologous 
or identical to one another, as can be readily identified by an analysis of the 
30 sequence. The coding sequence for the KS and AT domains of module 2 shares 
significant identity with the coding sequence for the KS and AT domains of 
module .6. This sequence homology or identity at the nucleic acid, e.g., DNA, level 
can render the nucleic acid unstable in certain host cells. To improve the stability 
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of the nucleic acids comprising a portion or the entire megalomicin PKS genes and 
megalomicin modification enzyme genes, the nucleic acid or DNA sequences can 
be changed to reduce or abolish the sequence homology or identity. Preferably, 
the DNA codons of homologous regions within the PKS or the megalomicin 
5 modification enzyme coding sequence are changed to reduce or abolish the 
sequence homology or identity without changing the amino acid sequences 
encoded by said changed DNA codons (see the examples below). The stability of 
the nucleic acid or DNA can also be improved by codon changes that reduce or 
abolish the sequence homology or identity while also changing the amino acid 

10 sequence, provided that the amino acid sequence change(s) does not substantially 
change the desired activity of the encoded megalomicin PKS. Thus, for example, 
one can simply substitute for the megAIIJ ORF an ORF from eryA/II, oleAIII, 
picAIII, or picAIV genes. 

The recombinant DNA compounds of the invention that encode the 

15 megalomicin PKS and modification proteins or portions thereof are useful in a 

variety of applications. While many of these applications relate to the heterologous 
expression of the megalomicin biosynthetic genes or the construction of hybrid 
PKS enzymes, many useful applications involve the natural megalomicin producer 
Micromonospora megalomicea. For example, one can use the recombinant DNA 

20 compounds of the invention to disrupt the megalomicin biosynthetic genes by 

homologous recombination in Micromonospora megalomicea. The resulting host 
cell is a preferred host cell for making polyketides modified by oxidation, 
hydroxylation, glycosylation, and acylation in a manner similar to megalomicin, 
because the genes that encode the proteins that perform these reactions are of 

25 course present in the host cell, and because the host cell does not produce 

megalomicin that could interfere with production or purification of the polyketide 
of interest. 

One illustrative recombinant host cell provided by the present invention 
expresses a recombinant megalomicin PKS in which the module 1 KS domain is 
30 inactivated by deletion or other mutation. In a preferred embodiment, the 

inactivation is mediated by a change in the KS domain that renders it incapable of 
binding substrate (called a KS1° mutation). In a particularly preferred 
embodiment, this inactivation is rendered by a mutation in the codon for the active 

26 

.*/234A2_l_> 



WO 01/27284 PCT/US00/27433 

site cysteine that changes the codon to another codon, such as an alanine codon. 
Such constructs are especially useful when placed in translational reading frame 
with extender modules 1 and 2 of a megalomicin or the corresponding modules of 
another PICS. The utility of these constructs is that host cells expressing, or cell 
5 free extracts containing, a PKS comprising the protein encoded thereby can be fed 
or supplied with N-acylcysteamine thioesters of precursor molecules to prepare a 
polyketide of interest. See U.S. patent application Serial No. 09/492,773, filed 27 
Jan. 2000, and PCT patent publication No. 00/447 1 7, both of which are 
incorporated herein by reference. Such KS1° constructs of the invention are useful 
10 in the production of 1 3-substituted-megalomicin compounds in Micromonospora 
megalomicea host cells. Preferred compounds of the invention include those 
compounds in which the substituent at the 13-position is propyl, vinyl, propargyl, 
other lower alkyl, and substituted alkyl. 

In a variant of this embodiment, one can employ a megalomicin PKS in 
15 which the ACP domain of module 1 has been rendered inactive. In another 
embodiment, one can delete the loading domain of the megalomicin PKS and 
provide monoketide substrates for processing by the remainder of the PKS. 

The compounds of the invention can also be used to construct recombinant 
host cells of the invention in which coding sequences for one or more domains or 
20 modules of the megalomicin PKS or for another megalomicin biosynthetic gene 
have been deleted by homologous recombination with the Micromonospora 
megalomicea chromosomal DNA. Those of skill in the art will appreciate that the 
compounds used in the recombination process are characterized by their homology 
with the chromosomal DNA and not by encoding a functional protein due to their 
25 intended function of deleting or otherwise altering portions of chromosomal DNA. 
For this and a variety of other applications, the compounds of the present 
invention include not only those DNA compounds that encode functional proteins 
but also those DNA compounds that are complementary or identical to any portion 
of the megalomicin biosynthetic genes. 
30 Thus, the invention provides a variety of modified Micromonospora 

megalomicea host cells in which one or more of the megalomicin biosynthetic 
genes have been mutated or disrupted. Transformation systems for M 
megalomicea have been described by Hasegawa et al^ 1991, J. BacterioL 
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173: 7004-1 1; and Takada et al. 7 1994, J. Aniibiot. 47:\ 167-1 170, both of which 
are incorporated herein by reference. These cells are especially useful when it is 
desired to replace the disrupted function with a gene product expressed by a 
recombinant DNA expression vector. While such expression vectors of the 
5 invention are described in more detail in the following Section, those of skill in 
the art will appreciate that the vectors have application to M. megalomicea as well. 
Such M megalomicea host cells can be preferred host cells for expressing 
megalomicin derivatives of the invention. Particularly preferred host cells of this 
type include those in which the coding sequence for the loading module has been 
10 mutated or disrupted, those in which one or more of any of the PKS gene ORFs 
has been mutated or disrupted, and/or those in which the genes for one or more 
modification (glycosylation, acylation, hydroxylation) have been mutated or 
disrupted. 

While the present invention provides many useful compounds having 
1 5 application to, and recombinant host cells derived from, Micromonospora 

megalomicea, many important applications of the present invention relate to the 
heterologous expression of all or a portion of the megalomicin biosynthetic genes 
in cells other than M megalomicea^ as described in Section V. 

20 Section IV: The Megalomicin Biosynthetic Enzymes and Antibodies Recognizing 
such Enzymes 

In another specific embodiment, the invention provides a substantially 
purified polypeptide, which is encoded by a nucleic acid fragment comprising a 
nucleotide sequence encoding a domain of megalomicin polyketide synthase 

25 (PKS) or a megalomicin modification enzyme. The polypeptide can comprise a 
single domain, multiple domains or a full-length megalomicin PKS or 
megalomicin modification enzyme. Functional fragments, analogs or derivatives 
of the megalomicin PKS or megalomicin modification enzyme polypeptides are 
also provided. Preferably, such fragments, analogs or derivatives can be 

30 recognized an antibody raised against a megalomicin PKS or megalomicin 

modification enzyme. Also preferably, such fragments, analogs or derivatives 
comprise an amino acid sequence that has at least 60% identity, more preferably at 
least 90% identity to their wild type counterparts. 
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An exemplary nucleotide sequence encoding, and the corresponding amino 
acid sequence of, a megalomicin biosynthetic enzyme is disclosed in SEQ ID 
NO:l . Homologs (e.g., nucleic acids of the above-listed genes of species other 
than Micromonospora megalomicea) or other related sequences (e.g., paralogs) 

5 can be obtained by low, moderate or high stringency hybridization with all or a 
portion of the particular sequence provided as a probe using methods well known 
in the art for nucleic acid hybridization and cloning (e.g., as described in Section 
III) in accordance with the methods of the present invention. 

The megalomicin biosynthetic enzyme proteins, or domains thereof, of the 

10 present invention can be obtained by methods well known in the art for protein 
purification and recombinant protein expression in accordance with the methods 
of the present invention. For recombinant expression of one or more of the 
proteins, the nucleic acid containing all or a portion of the nucleotide sequence 
encoding the protein can be inserted into an appropriate expression vector, i.e., a 

1 5 vector that contains the necessary elements for the transcription and translation of 
the inserted protein coding sequence. Transcriptional and translational signals can 
be supplied by the native promoter for a megalomicin biosynthetic gene and/or 
flanking regions. 

A variety of host-vector systems may be utilized to express the protein 
20 coding sequence. These include but are not limited to mammalian cell systems 
infected with virus (e.g. vaccinia virus, adenovirus, and the like); insect cell 
systems infected with virus (e.g. baculovirus); microorganisms such as yeast 
containing yeast vectors; or bacteria transformed with bacteriophage, DNA, 
plasmid DNA, or cosmid DNA. The expression elements of vectors vary in their 
25 properties. Depending on the host-vector system utilized, any one of a number of 
suitable transcription and translation elements may be used. 

In a specific embodiment, a vector is used that comprises a promoter 
operably linked to nucleic acid sequences encoding a megalomicin biosynthetic 
enzyme, or a domain, fragment, derivative or homolog, thereof, one or more 
30 origins of replication, and optionally, one or more selectable markers (e.g., an 
antibiotic resistance gene). 

Expression vectors containing the sequences of interest can be identified 
by three general approaches: (a) nucleic acid hybridization, (b) presence or 
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absence of "marker" gene function, and (c) expression of the inserted sequences. 
In the first approach, megalomicin biosynthetic nucleic acid sequences can be 
detected by nucleic acid hybridization to probes comprising sequences 
homologous and complementary to the inserted sequences. In the second 
5 approach, the recombinant vector/host system can be identified and selected based 
upon the presence or absence of certain "marker" functions (e.g., binding to an 
anti-megalomicin biosynthetic enzyme antibody, resistance to antibiotics, 
occlusion body formation in bacutovirus, and the like) caused by insertion of the 
sequences of interest in the vector. For example, if a megalomicin biosynthetic 

10 gene, or portion thereof, is inserted within the marker gene sequence of the vector, 
recombinants containing the megalomicin biosynthetic gene fragment will be 
identified by the absence of the marker gene function. In the third approach, 
recombinant expression vectors can be identified by assaying for the megalomicin 
biosynthetic gene products expressed by the recombinant vector. Such assays can 

1 5 be based, for example, on the physical or functional properties of the interacting 
species in in vitro assay systems, e.g., megalomicin synthesis activity, 
immunoreactivity to antibodies specific for the protein. 

Once recombinant megalomicin biosynthetic genes or nucleic acids are 
identified, several methods known in the art can be used to propagate them in 

20 accordance with the methods of the present invention. Once a suitable host 
system and growth conditions have been established, recombinant expression 
vectors can be propagated and amplified in quantity. As previously described, the 
expression vectors or derivatives which can be used include, but are not limited to: 
human or animal viruses such as vaccinia virus or adenovirus; insect viruses such 

25 as baculovirus, yeast vectors; bacteriophage vectors such as lambda phage; and 
plasmid and cosmid vectors. 

In addition, a host cell strain may be chosen that modulates the expression 
of the inserted sequences, or modifies or processes the expressed proteins in the 
specific fashion desired. Expression from certain promoters can be elevated in the 

30 presence of certain inducers; thus expression of the genetically-engineered 

megalomicin biosynthetic enzymes may be controlled. Furthermore, different host 
cells have characteristic and specific mechanisms for the translational and post- 
translational processing and modification (e.g. glycosylation, phosphorylation, and 
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the like) of proteins. Appropriate cell lines or host systems can be chosen to 
ensure the desired modification and processing of the foreign protein is achieved. 
For example, expression in a bacterial system can be used to produce an 
unglycosylated core protein, while expression in mammalian cells ensures 
5 "native" glycosylation of a heterologous protein. Furthermore, different 

vector/host expression systems may effect processing reactions to different extent. 

In particular, megalomicin biosynthetic enzyme derivatives can be made by 
altering their sequences by substitutions, additions or deletions that provide for 
functionally equivalent molecules. Due to the degeneracy of nucleotide coding 
10 sequences, other DNA sequences which encode substantially the same amino acid 
sequence as an megalomicin biosynthetic gene can be used in the practice of the 
present invention. These include but are not limited to nucleotide sequences 
comprising all or portions of megalomicin biosynthetic genes that are altered by 
the substitution of different codons that encode the amino acid residue within the 
15 sequence, thus producing a silent change. Likewise, the megalomicin biosynthetic 
enzyme derivatives of the invention include, but are not limited to, those 
containing, as a primary amino acid sequence, all or part of the amino acid 
sequence of megalomicin biosynthetic enzymes, including altered sequences in 
which functionally equivalent amino acid residues are substituted for residues 
20 within the sequence resulting in a silent change. For example, one or more amino 
acid residues within the sequence can be substituted by another amino acid of a 
similar polarity which acts as a functional equivalent, resulting in a silent 
alteration. Substitutes for an amino acid within the sequence may be selected 
from other members of the class to which the amino acid belongs. For example, 
25 the nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, 
valine, proline, phenylalanine, tryptophan and methionine. The polar neutral 
amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and 
glutamine. The positively charged (basic) amino acids include arginine, lysine and 
histidine. The negatively charged (acidic) amino acids include aspartic acid and 

30 glutamic acid. 

In a specific embodiment of the invention, the nucleic acids encoding 
proteins and proteins consisting of or comprising a domain or a fragment of 
megalomicin biosynthetic enzyme consisting of at least 6 (continuous) amino 
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acids are provided. In other embodiments, the domain or fragment consists of at 
least 10, 20, 30, 40, or 50 amino acids of a megalomicin biosynthetic enzyme. In 
specific embodiments, such domains or fragments are not larger than 35, 100 or 
200 amino acids. Derivatives .or analogs of megalomicin biosynthetic enzyme 
5 include but are not limited to molecules comprising regions that are substantially 
homologous to megalomicin biosynthetic enzyme in various embodiments, at least 
30%, 40%, 50%, 60%, 70%, 80%, 90% or 95% identity over an amino acid 
sequence of identical size or when compared to an aligned sequence in which the 
alignment is done by a computer homology program known in the art in 

1 0 accordance with the methods of the present invention or whose encoding nucleic 
acid is capable of hybridizing to a sequence encoding a megalomicin biosynthetic 
enzyme under stringent, moderately stringent, or nonstringent conditions. 

The megalomicin biosynthetic enzyme domains, derivatives and analogs of 
the invention can be produced by various methods known in the art in accordance 

15 with the methods of the present invention. The manipulations which result in their 
production can occur at the gene or protein level. For example, the cloned 
megalomicin biosynthetic gene sequence can be modified by any of numerous 
strategies known in the art (Sambrook et al., 1990, Molecular Cloning, A 
Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, 

20 New York) in accordance with the methods of the present invention. The 

sequences can be cleaved at appropriate sites with restriction endonuclease(s), 
followed by further enzymatic modification if desired, isolated, and ligated in 
vitro. 

Additionally, the megalomicin biosynthetic enzyme-encoding nucleotide 
25 sequence can be mutated in vitro or in vivo, to create and/or destroy translation, 
initiation, and/or termination sequences, or to create variations in coding regions 
and/or form new restriction endonuclease sites or destroy pre-existing ones, to 
facilitate further in vitro modification. Any technique for mutagenesis known in 
the art can be used in accordance with the methods of the present invention, 
30 including but not limited to, chemical mutagenesis and in vitro site-directed 
mutagenesis (Hutchinson et al., J. Biol. Chem. 253:6551-6558 (1978)), use of 
TAB® linkers (Pharmacia), and the like. 
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Once a recombinant cell expressing a megalomicin biosynthetic enzyme 
protein, or a domain, fragment or derivative thereof, is identified, the individual 
gene product can be isolated and analyzed. This is achieved by assays based on 
the physical and/or functional properties of the protein, including, but not limited 
5 to, radioactive labeling of the product followed by analysis by gel electrophoresis, 
immunoassay, cross-linking to marker-labeled product, and the like. 

The megalomicin biosynthetic enzyme proteins may be isolated and 
purified by standard methods known in the art or recombinant host cells 
expressing the complexes or proteins in accordance with the methods of the 
10 invention, including but not restricted to column chromatography (e.g., ion 

exchange, affinity, gel exclusion, reversed-phase high pressure, fast protein liquid, 
and the like), differential centrifugation, differential solubility, or by any other 
standard technique used for the purification of proteins. Functional properties 
may be evaluated using any suitable assay known in the art in accordance with the 
15 methods of the present invention. 

Alternatively, once a megalomicin biosynthetic enzyme or its domain or 
derivative is identified, the amino acid sequence of the protein can be deduced . 
from the nucleotide sequence of the gene which encodes it. As a result, the 
protein or its domain or derivative can be synthesized by standard chemical 
20 methods known in the art in accordance with the methods of the present invention 
(see Hunkapiller et al, Nature 310:105-1 11(1 984)). 

Manipulations of megalomicin biosynthetic enzymes may be made at the 
protein level. Included within the scope of the invention are megalomicin 
biosynthetic enzyme domains, derivatives or analogs or fragments, which are 
25 differentially modified during or after translation, e.g., by glycosylation, 
acetylation, phosphorylation, amidation, derivatization by known 
protecting/blocking groups, proteolytic cleavage, linkage to an antibody molecule 
or other cellular ligand, and the like. Any of numerous chemical modifications 
may be carried out by known techniques, including but not limited to specific 
30 chemical cleavage by cyanogen bromide, trypsin, chymotrypsin, papain, V8 
protease, NaBH4, acetylation, fonnylation, oxidation, reduction, metabolic 
synthesis in the presence of tunicamycin, and the like. 
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In specific embodiments, the megalomicin biosynthetic enzymes are 
modified to include a fluorescent label. In other specific embodiments, the 
megalomicin biosynthetic enzyme is modified to have a heterofunctional reagent, 
such heterofunctional reagents can be used to crosslink the members of the 
5 complex. 

In addition, domains, analogs and derivatives of a megalomicin 
biosynthetic enzyme can be chemically synthesized. For example, a peptide 
corresponding to a portion of a megalomicin biosynthetic enzyme, which 
comprises the desired domain or which mediates the desired activity in vitro can 

10 be synthesized by use of a peptide synthesizer. Furthermore, if desired, 

nonclassical amino acids or chemical amino acid analogs can be introduced as a 
substitution or addition into the megalomicin biosynthetic enzyme sequence. 
Non-classical amino acids include but are not limited to the D-isomers of the 
common amino acids, alpha-amino isobutyric acid, 4-aminobutyric acid, 

1 5 2-aminobutyric acid, 6-amino hexanoic acid, Aib, 2-amino isobutyric acid, 
3-amino propionoic acid, ornithine, norleucine, norvaline, hydroxyproline, 
sarcosine, citrulline, cysteic acid, t-butylgtycine, t-butylalanine, phenylglycine, 
cyclohexylalanine, B-alanine, fluoro-amino acids, designer amino acids such as B- 
methyl amino acids, Ca-methyl amino acids, Na-methyl amino acids, and amino 

20 . acid analogs in general; Furthermore, the amino acid can be D (dextrorotary) or L 
(levorotary). 

In cases where natural products are suspected of being mutant or are 
isolated from new species, the amino acid sequence of the megalomicin 
biosynthetic enzyme isolated from the natural source, as well as those expressed in 

25 vitro, or from synthesized expression vectors in vivo or in vitro, can be determined 
from analysis of the DNA sequence, or alternatively, by direct sequencing of the 
isolated protein. Such analysis may be performed by manual sequencing or 
through use of an automated amino acid sequenator. 

The megalomicin biosynthetic enzyme proteins may also be analyzed by 

30 hydrophilicity analysis (Hopp and Woods, Proc. Natl. Acad. ScL USA 78:3824- 
3828 (1981)). A hydrophilicity profile can be used to identify the hydrophobic 
and hydrophilic regions of the proteins, and help predict their orientation in 
designing substrates for experimental manipulation, such as in binding 
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experiments, antibody synthesis, and the like. Secondary structural analysis can 
also be done to identify regions of the megalomicin biosynthetic enzyme that 
assume specific structures (Chou and Fasman, Biochemistry 13:222-23 ( 1 974)). 
Manipulation, translation, secondary structure prediction, hydrophilicity and 
5 hydrophobicity profiles, open reading frame prediction and plotting, and 

determination of sequence homologies, can be accomplished using computer 
software programs available in the art. 

Other methods of structural analysis including but not limited to X-ray 
crystallography (Engstrom, Biochem. Exp. 5/o/..JJ_:7-13 (1974)), mass 
10 spectroscopy and gas chromatography (Methods in Protein Science, J. Wiley and 
Sons, New York, 1997), and computer modeling (Fletterick and Zoller, eds., 1986, 
Computer Graphics and Molecular Modeling, In: Current 'Communications in 
Molecular Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, 
New York) can also be employed. 
1 5 The invention also provides an antibody, or a fragment or derivative 

thereof, which immuno-specifically binds to a domain of megalomicin polyketide 
synthase (PKS) or a megalomicin modification enzyme. In a specific 
embodiment, an antibody which immuno-specifically binds to a domain of the 
megalomicin. biosynthetic enzyme encoded by a nucleic acid that hybridizes to a 
20 nucleic acid having the nucleotide sequence set forth in the SEQ. ID NO: I , or a 
fragment or derivative of said antibody containing the binding domain thereof is 
provided. Preferably, the antibody is a monoclonal antibody. 

The megalomicin biosynthetic enzyme protein and domains, fragments, 
homologs and derivatives thereof may be used as immunogens to generate 
25 antibodies which immunospecifically bind such immunogens. Such antibodies 
include but are not limited to polyclonal, monoclonal, chimeric, single chain, Fab 
fragments, and an Fab expression library. 

Various procedures known in the art may be used for the production of 
polyclonal antibodies to a megalomicin biosynthetic enzyme protein of the 
30 invention, its domains, derivatives, fragments or analogs in accordance with the 
methods of the present invention. 

For production of the antibody, various host animals can be immunized by 
injection with the native megalomicin biosynthetic enzyme protein or a synthetic 
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version, or a derivative of the foregoing, such as a cross-linked megalomicin 
biosynthetic enzyme. Such host animals include but are not limited to rabbits, 
mice, rats, and the like. Various adjuvants can be used to increase the 
immunological response, depending on the host species, and include but are not 
5 limited to Freund's (complete and incomplete), mineral gels such as aluminum 
hydroxide, surface active substances such as lysolecithin, pluronic polyols, 
polyanions, peptides, oil emulsions, dinitrophenol, and potentially useful human 
adjuvants such as bacille Calmette-Guerin (BCG) and corynebacterium parvum. 

For preparation of monoclonal antibodies directed towards a megalomicin 

10 biosynthetic enzyme or domains, derivatives, fragments or analogs thereof, any 
technique that provides for the production of antibody molecules by continuous 
cell lines in culture may be used. Such techniques include but are not restricted to 
the hybridoma technique originally developed by Kohler and Milstein {Nature 
256:495-497 (1975)), the trioma technique, the human B-cell hybridoma technique 

1 5 (Kozbor et ah, Immunology Today 4:72 (1983)), and the EBV hybridoma 

technique to produce human monoclonal antibodies (Cole et al., in Monoclonal 
Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96 (1985)). In an 
additional embodiment, monoclonal antibodies can be produced in germ-free 
animals (W089/ 12690). Human antibodies may be used and can be obtained by 

20 using human hybridomas (Cote et al., Proc. Natl Acad. Sci USA 80:2026-2030 
(1983)) or by transforming human B cells with EBV virus in vitro (Cole et al., in 
Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, . Inc., pp. 77-96 
(1985)). Techniques developed for the production of "chimeric antibodies" 
(Morrison et al., Proc. Natl. Acad. ScL USA 81 :685 1-6855 (1984); Neuberger et 

25 al., Nature 312:604-608 (1984); Takeda et al., Nature 314:452-454 (1985)) by 
splicing the genes from a mouse antibody molecule specific for the megalomicin 
biosynthetic enzyme protein together with genes from a human antibody molecule 
of appropriate biological activity can be used; such antibodies are within the scope 
of this invention. 

30 Techniques described for the production of single chain antibodies (U.S. 

patent 4,946,778) can be adapted to produce megalomicin biosynthetic enzyme- 
specific single chain antibodies. An additional embodiment utilizes the techniques 
described for the construction of Fab expression libraries (Huse et al., Science 
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246:1275-1281 (1989)) to allow rapid and easy identification of monoclonal Fab 
fragments with the desired specificity for megalomicin biosynthetic enzyme, or 
domains, derivatives, or analogs thereof. Non-human antibodies can be 
"humanized" by known methods (see, e.g., U.S. Patent No. 5,225,539). 

5 Antibody fragments that contain the idiotypes of a megalomicin 

biosynthetic enzyme can be generated by techniques known in the art in 
accordance with the methods of the present invention. For example, such 
fragments include but are not limited to: the F(ab')2 fragment which can be 
produced by pepsin digestion of the antibody molecule; the Fab' fragments that 

10 can be generated by reducing the disulfide bridges of the F(ab')2 fragment, the Fab 
fragments that can be generated by treating the antibody molecular with papain 
and a reducing agent, and Fv fragments. 

In the production of antibodies, screening for the desired antibody can be 
accomplished by techniques known in the art in accordance with the methods of 

1 5 the present invention, e.g. , ELISA (enzyme-linked immunosorbent assay). To 
select antibodies specific to a particular domain of the megalomicin biosynthetic 
enzyme, one may assay generated hybridomas for a product that binds to the 
fragment of a megalomicin biosynthetic enzyme that contains such a domain. 

The foregoing antibodies can be used in methods known in the art relating 

20 to the localization and/or quantitation of megalomicin biosynthetic enzyme 

proteins, e.g., for imaging these proteins or measuring levels thereof in samples, in 
accordance with the methods of the present invention. 

Section V: Heterologous Expression of the Megalomicin Biosynthetic Genes 
25 In one important embodiment, the invention provides methods for the 

heterologous expression of one or more of the megalomicin biosynthetic genes 
and recombinant DNA expression vectors useful in the method. For purposes of 
the invention, any host cell other than Micromonospora megalomicea is a 
heterologous host cell. Thus, included within the scope of the invention in 
30 addition to isolated nucleic acids encoding domains, modules, or proteins of the 
megalomicin PKS and modification enzymes, are recombinant expression vectors 
that include such nucleic acids. The term expression vector refers to a nucleic acid 
that can be introduced into a host cell or cell-free transcription and translation 
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system. An expression vector can be maintained permanently or transiently in a 
cell, whether as part of the chromosomal or other DNA in the cell or in any 
cellular compartment, such as a replicating vector in the cytoplasm. An expression 
vector also comprises a promoter that drives expression of an RNA, which 
5 typically is translated into a polypeptide in the cell or cell extract. For efficient 
translation of RNA into protein, the expression vector also typically contains a 
ribosome-binding site sequence positioned upstream of the start codon of the 
coding sequence of the gene to be expressed. Other elements, such as enhancers, 
secretion signal sequences, transcription termination sequences, and one or more 

1 0 marker genes by which host cells containing the vector can be identified and/or 
selected, may also be present in an expression vector. Selectable markers, i.e., 
. genes that confer antibiotic resistance or sensitivity, are preferred and confer a 
selectable phenotype on transformed cells when the cells are grown in an 
appropriate selective medium. 

1 5 The various components of an expression vector can vary widely, 

depending on the intended use of the vector and the host cell(s) in which the 
vector is intended to replicate or drive expression. Expression vector components 
suitable for the expression of genes and maintenance of vectors in E. coli, yeast, 
Streptomyces, and other commonly used cells are widely known and commercially 

20 available. For example, suitable promoters for inclusion in the expression vectors 
of the invention include those that function in eucaryotic or procaryotic host cells. 
Promoters can comprise regulatory sequences that allow for regulation of 
expression relative to the growth of the host cell or that cause the expression of a 
gene to be turned on or off in response to a chemical or physical stimulus. For E. 

25 coli and certain other bacterial host cells, promoters derived from genes for 
biosynthetic enzymes, antibiotic-resistance conferring enzymes, and phage 
proteins can be used and include, for example, the galactose, lactose (lac), 
maltose, tryptophan (trp), beta-lactamase (bla), bacteriophage lambda PL, and T5 
promoters. In addition, synthetic promoters, such as the tac promoter (U.S. Patent 

30 No. 4,55 1 ,433), can also be used. 

Thus, recombinant expression vectors contain at least one. expression 
system, which, in turn, is composed of at least a portion of the megalomicin PKS 
and/or other megalomicin biosynthetic gene coding sequences operably linked to a 
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promoter and optionally termination sequences that operate to effect expression of 
the coding sequence in compatible host cells. The host cells are modified by 
transformation with the recombinant DNA expression vectors of the invention to 
contain the expression system sequences either as extrachromosomal elements or 
5 integrated into the chromosome. The resulting host cells of the invention are 

useful in methods to produce PKS and post-PKS modification enzymes as well as 
polyketides and antibiotics and other useful compounds derived therefrom. 

Preferred host cells for purposes of selecting vector components for 
expression vectors of the present invention include fungal host cells such as yeast 
10 and procaryotic host cells such as £ coli and Streptomyces, but mammalian host 
cells can also be used. In hosts such as yeasts, plants, or mammalian cells that 
ordinarily do not produce polyketides, it may be necessary to provide, also 
typically by recombinant means, suitable holo-ACP synthases to convert the 
recombinantly produced PKS to functionality. Provision of such enzymes is 
15 described, for example, in PCT publication Nos^ WO 97/13845 and 98/27203, 

each of which is incorporated herein by reference. Particularly preferred host cells 
for purposes of the present invention are Streptomyces and Sacchctropolyspora 
host cells, as discussed in greater detail below. 

In a preferred embodiment, the expression vectors of the invention are 
20 used to construct a heterologous recombinant Streptomyces host cell that expresses 
a recombinant PKS of the invention. Streptomyces is a convenient host for 
expressing polyketides, because polyketides are naturally produced in certain 
Streptomyces species, and Streptomyces cells generally produce the precursors 
needed to form the desired polyketide. Those of skill in the art will recognize that, 
25 if a Streptomyces host cell produces any portion of a PKS enzyme or produces a 
polyketide modification enzyme, the recombinant vector need drive expression of 
only those genes constituting the remainder of the desired PKS enzyme or other 
polyketide-modifying enzymes. Thus, such a vector may comprise only a single 
ORF, with the desired remainder of the polypeptides constituting the PKS 
30 provided by the genes on the host cell chromosomal DNA. 

If a Streptomyces or other host cell ordinarily produces polyketides, it may 
be desirable to modify the host so as to prevent the production of endogenous 
polyketides prior to its use to express a recombinant PKS of the invention. Such 
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modified hosts include S. coelicolor CH999 and similarly modified S. lividans 
described in U.S. Patent No. 5,672,491, and PCT publication Nos. WO 95/08548 
and WO 96/40968, incorporated herein by reference. In such hosts, it may not be 
necessary to provide enzymatic activities for all of the desired post-translational 
5 modifications of the enzymes that make up the recombinantly produced PKS, 
because the host naturally expresses such enzymes. In particular, these hosts 
generally contain holo-ACP synthases that provide the phosphopantotheinyl 
residue needed for functionality of the PKS. 

The invention provides a wide variety of expression vectors for use in 

10 Strepiomyces. The replicating expression vectors of the present invention include, 
for example and without limitation, those that comprise an origin of replication 
from a low copy number vector, such as SCP2* (see Hopwood et al. y Genetic 
Manipulation ofStreptomyces: A Laboratory manual (The John Innes Foundation, 
Norwich, U.K., 1985); Lydiate et aL, 1985, Gene 35: 223-235; and Kieser and 

1 5 Melton, 1 988, Gene 65: 83-91 , each of which is incorporated herein by reference), 
SLP1.2 (Thompson et aL, 1982, Gene 20: 51-62, incorporated herein by 
reference), and pSG5(ts) (Muth et aL, 1989, MoL Gen, Genet. 219: 341-348, and 
Bierman et al, 1992, Gene 116: 43-49, each of which is incorporated herein by 
reference), or a high copy number vector, such as pIJlOl and pJVl (see Katz et 

20 aL, 1983,,/ Gen. Microbiol. 129: 2703-2714; Vara et aL, 1989,./ Bacterid 171: 
5782-5781; and Servin-Gonzalez, 1993, Plasmid 30: 131-140, each of which is 
incorporated herein by reference). For non-replicating and integrating vectors and 
generally for any vector, it is useful to include at least an E. coli origin of 
replication, such as from pUC, plP, pi I, and pBR. For phage based vectors, the 

25 phage phiC3 1 and its derivative KC5 1 5 can be employed (see Hopwood et aL, 
supra). Also, plasmid pSET152, plasmid pSAM, plasmids pSElOl and pSE21 1, 
all of which integrate site-specifically in the chromosomal DNA of S. lividans, can 
be employed for purposes of the present invention. 

The Streptomyces recombinant expression vectors of the invention 

30 typically comprise one or more selectable markers, including antibiotic resistance 
conferring genes selected from the group consisting of the ermE (confers 
resistance to erythromycin and lincomycin), tsr (confers resistance to 
thiostrepton), aadA (confers resistance to spectinomycin and streptomycin), aacC4 
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(confers resistance to aprarnycin, kanamycin, gentamicin, geneticin (G418), and 
neomycin), hyg (confers resistance to hygromycin), and vph (confers resistance to 
viomycin) resistance conferring genes. Alternatively, several polyketides are 
naturally colored, and this characteristic can provide a built-in marker for 
5 identifying cells. 

Megalomicins are currently produced only by the relatively genetically 
intractable host Micromonospora megalomicinea. This bacteria has not been 
commonly used in the fermentation industry for the large-scale production of 
antibiotics, and methods for high level production of megalomicin and its analogs 
10 are needed. In contrast, the streptomycete bacteria have been widely used for 
almost 50 years and are excellent hosts for production of megalomicin and its 
analogs. Streptomyces lividans and S. coelicolor have been developed for the 
expression of heterologous PKS systems. These organisms can stably maintain 
cloned heterologous PKS genes, express them at high levels under controlled 
1 5 conditions, and modify the corresponding PKS proteins (e.g., 

phosphopantotheinylation) so that they are capable of production of the polyketide 
they encode. Furthermore, these hosts contain the necessary pathways to produce 
the substrates required for polyketide synthesis; e.g. propionyl-CoA and 
methylmalonyl-CoA. A wide variety of cloning and expression vectors are 
20 available for these hosts, as are methods for the introduction and stable 

maintenance of large segments of foreign DNA. Relative to Micromonospora spp., 
S, lividans and S. coelicolor grow well on a number of media and have been 
adapted for high level production of polyketides in fermentors. If production levels 
are low, a number of rational approaches are available to improve yield (see 
25 Hosted and Baltz, 1996, Trends Biotechnol. 74(7):245-50, incorporated herein by 
reference). Empirical methods to increase the titers of these macrolides, long since 
proven effective for numerous bacterial polyketides, can also be employed. 

Preferred Streptomyces host cell/vector combinations of the invention 
include S. coelicolor CH999 and S. lividans K4-1 14 host cells, which have been 
30 modified so as not to produce the polyketide actinorhodin, and expression vectors 
derived from the pRMl and pRM5 vectors, as described in U.S. Patent Nos. 
5,830,750 and 6,022,731 and U.S. patent application Serial No. 09/181,833, filed 
28 Oct. 1998, each of which is incorporated herein by reference. These vectors are 
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particularly preferred in that they contain promoters compatible with numerous 
and diverse Streptomyces spp. Particularly useful promoters for Streptomyces host 
cells include those from PKS gene clusters that result in the production of 
polyketides as secondary metabolites, including promoters from aromatic (Type II) 
5 PKS gene clusters. Examples of Type II PKS gene cluster promoters are act gene 
promoters and icm gene promoters; an example of a Type I PKS gene cluster 
promoter are the promoters of the spiramycin PKS genes and DEBS genes. The 
present invention also provides the megalomicin biosynthetic gene promoters in 
recombinant form. These promoters can be used to drive expression of the 

10 megalomicin biosynthetic genes or any other coding sequence of interest in host 
cells in which the promoter functions, particularly Micromonospora megalomicea 
and generally any Streptomyces species. 

As described above, particularly useful control sequences are those that 
alone or together with suitable regulatory systems activate expression during 

15 transition from growth to stationary phase in the vegetative mycelium. The 
promoter contained in the aforementioned plasmid pRM5, i.e., the actl/actHI 
promoter pair and the actII-ORF4 activator gene, is particularly preferred. Other 
useful Streptomyces promoters include without limitation those from the ermE 
gene and the melCl gene, which act constitutively, and the tipA gene and the merA 

20 gene, which can be induced at any growth stage. In addition, the T7 RNA 

polymerase system has been transferred to Streptomyces and can be employed in 
the vectors and host cells of the invention. In this system, the coding sequence for 
the T7 RNA polymerase is inserted into a neutral site of the chromosome or in a 
vector under the control of the inducible merA promoter, and the gene of interest is 

25 placed under the control of the T7 promoter. As noted above, one or more 
activator genes can also be employed to enhance the activity of a promoter. 
Activator genes in addition to the actII-ORF4 gene described above include dnrl, 
redD, and ptpA genes (see U.S. patent application Serial No. 09/181,833, supra). 
To provide a preferred host cell and vector for purposes of the invention, 

30 the megalomicin biosynthetic genes are placed on a recombinant expression vector 
and transferred to the non-macrolide producing hosts Streptomyces lividans K4- 
1 14 and S. coelicolor CH999. Transformation of S. lividans K4-1 14 or S. 
coelicolor CH999 with this expression vector results in a strain which produces 
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detectable amounts of megalomicin as determined by analysis of extracts by 
LC/MS. As noted above, the present invention also provides recombinant DN A 
compounds in which the encoded megalomicin module I KS domain is 
inactivated (the KS1° mutation). The introduction into Streplomyces lividans or S. 
5 coelicolor of a recombinant expression vector of the invention that encodes a 
megalomicin PKS with a KS1° domain produces a host cell useful for making 
polyketides by a process known as diketide feeding. The resulting host cells can be 
fed or supplied with N-acylcysteamine thioesters of precursor molecules to 
prepare megalomicin derivatives. Such cells of the invention are especially useful 
1 0 in the production of 1 3-substituted-6-deoxyerythronolide B compounds in 
recombinant host cells. Preferred compounds of the invention include those 
compounds in which the substituent at the 13-position is propyl, vinyl, propargyl, 
other lower alkyl, and substituted alkyl. In a preferred embodiment, the meg PKS 
is produced from a recombinant construct in which the megAIII gene has been 
1 5 altered to abolish the regions of identical coding sequence it otherwise shares with 
the megAI gene, or a hybrid PKS is employed in which the megAIII gene product 
has been replaced by the oleAIII gene product. Recombinant oleAIII genes are 
described in, for example, PCT patent publication No. 00/026349 and U.S. patent 
application Serial No. 09/428,5 1 7, filed 28 Oct. 1 999, both of which are 
20 incorporated herein by reference. 

The recombinant host cells of the invention can express all of the 
megalomicin biosynthetic genes or only a subset of the same. For example, if only 
the genes for the megalomicin PKS are expressed in a host cell that otherwise does 
not produce polyketide modifying enzymes that can act on the polyketide 
25 produced, then the host cell produces unmodified polyketides, called macrolide 
aglycones. Such macrolide aglycones can be hydroxylated and glycosylated by 
adding them to the fermentation of a strain such as, for example, Streplomyces 
antibioiicus or Saccharopolyspora erythraea, that contains the requisite 
modification enzymes. 
30 There are a wide variety of diverse organisms that can modify macrolide 

aglycones to provide compounds with, or that can be readily modified to have, 
useful activities. For example, as shown in Figure 5, Saccharopolyspora erythraea 
can convert 6-dEB to a variety of useful compounds. The erythronolide 6-dEB is 
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converted by the eryFgene product to erythronoiide B, which is, in turn, 
glycosylated by the eryBV gene product to obtain 3-O-mycarosylerythronolide B, 
which contains L-mycarose at C-3. The eryCIII gene product then converts this 
compound to erythromycin D by glycosylation with D-desosamjne at C-5. 
5 Erythromycin D, therefore, differs from 6-dEB through glycosylation and by the 
addition of a hydroxyl group at C-6. Erythromycin D can be converted to 
erythromycin B in a reaction catalyzed by the eryG gene product by methylating 
the L-mycarose residue at C-3. Erythromcyin D is converted to erythromycin C by 
the addition of a hydroxyl group at C- 12 in a reaction catalyzed by the eryK gene 

10 product. Erythromycin A is obtained from erythromycin C by methylation of the 
mycarose residue in a reaction catalyzed by the eryG gene product. The 
unmodified megalomicin compounds provided by the present invention, such as, 
for example, the 6-dEB or 6-dEB analogs, produced in Streptomyces lividans, can 
be provided to cultures of S. erythraea and converted to the corresponding 

15 derivatives of erythromycins A, B, C, and D in accordance with the procedure 
provided in the examples below. To ensure that only the desired compound is 
produced, one can use an S. erythraea eryA mutant that is unable to produce 6- 
dEB but can still carry out the desired conversions (Weber et aL, 1 985, J. 
Bactehol. J64(\): 425-433). Also, one can employ other mutant strains, such as 

20 eryB, eryC, eryG, and/or eryK mutants, or mutant strains having mutations in 

multiple genes, to accumulate a preferred compound. The conversion can also be 
carried out in large fermentors for commercial production. 

Moreover, there are other useful organisms that can be employed to 
hydroxylate and/or glycosylate the compounds of the invention. As described 

25 above, the organisms can be mutants unable to produce the polyketide normally 

produced in that organism, the fermentation can be carried out on plates or in large 
fermentors, and the compounds produced can be chemically altered after 
fermentation. Thus, Streptomyces venezuelae, which produces picromycin, 
contains enzymes that can transfer a desosaminyl group to the C-5 hydroxyl and a 

30 hydroxyl group to the C-12 position. In addition, S. venezuelae contains a 

glucosylation activity that glucosylates the 2'-hydroxyl group of the desosamine 
sugar. This latter modification reduces antibiotic activity, but the glucosyl residue 
is removed by enzymatic action prior to release of the polyketide from the cell. 
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Another organism, S. narbonensis, contains the same modification enzymes as S. 

venezuelae, except the C-12 hydroxylase. Thus, the present invention provides the 
compounds produced by hydroxylation and glycosylation of the macrolide 
aglycones of the invention by action of the enzymes endogenous to S. narbonensis 

5 and S. venezuelae. 

Other organisms suitable for making compounds of the invention include 
Micromonospora megalomicea (discussed above), Streptomyces antibioticus, S. 
fradiae, and S. thermotolerans. S. antibioticus produces oleandomycin and 
contains enzymes that hydroxylate the C-6 and C-12 positions, glycosylate the C-3 

10 hydroxyl with oleandrose and the C-5 hydroxyl with desosamine, and form an 
epoxide at C-8-C-8a. S, fradiae contains enzymes that glycosylate the C-5 
hydroxyl with mycaminose and then the 4 '-hydroxyl of mycaminose with 
mycarose, forming a disaccharide. S. thermotolerans contains the same activities 
as S.fradiae, as well as acylation activities. Thus, the present invention provides 

15 the compounds produced by hydroxylation and glycosylation of the macrolide 

aglycones of the invention by action of the enzymes endogenous to S. antibioticus, 
S. fradiae, and S. thermotolerans. 

The present invention also provides methods and genetic constructs for 
producing the glycosylated and/or hydroxylated compounds of the invention 

20 directly in the host cell of interest. Thus, the recombinant genes of the invention, 
which include recombinant megAI, megAIl, and megAIII genes with one or more 
deletions and/or insertions, including replacements of a megA gene fragment with 
a gene fragment from a heterologous PKS gene (as discussed in the next Section), 
can be included on expression vectors suitable for expression of the encoded gene 

25 products in Saccharopolyspora erythraea, Streptomyces antibioticus, S. 

venezuelae, S. narbonensis, Micromonospora megalomicea, S. fradiae, and S. 
thermotolerans. 

A number of erythromycin high-producing strains of Saccharopolyspora 
erythraea and Streptomyces fradiae have been developed, and in a preferred 
30 embodiment, the megalomicin PKS and/or other megalomicin biosynthetic genes 
are introduced into such strains (or erythromycin non-producing mutants thereof) 
to provide the corresponding modified megalomicin compounds in high yields. 
Those of skill in the art will appreciate that S. erythraea contains the desosamine 
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and mycarose biosynthetic and transfer genes as well as DEBS, which, as noted 
above, makes the same macrolide aglycone, 6-dEB, as the megalomicin PKS. S. 
erythraea does not make megosamine or its corresponding transferase gene, and 
does not contain the acylation gene of Micromonospora megalomicea. Finally, the 
5 S. erythraea eryG gene product converts mycarose to cladinose, which does not 
occur in M. megalomicea. Thus, the present invention provides a wide variety of 
S. erythraea recombinant host cells, including, for example, those that contain: 

(i) wild-type erythromycin biosynthetic genes with recombinant 
megosamine biosynthetic and transfer genes, with and without megalomicin 

10 acylation genes; 

(ii) wild-type erythromycin biosynthetic genes except eryG, with 
recombinant megosamine biosynthetic and transfer genes, with and without 
megalomicin acylation genes; and 

(iii) as in (i) and (ii), except that the eryA genes are inactive or deleted and 
15 recombinant megA genes have been introduced. 

The invention provides other S. erythraea strains as well, including those 
in which any one or more of the erythromycin biosynthetic genes have been 
deleted or otherwise rendered inactive and in which at least one megalomicin 
biosynthetic gene has been introduced. 

20 For example, the present invention enables one to express the megosamine 

genes in a Saccharopolyspora erythraea eryG mutant in which the erythromycin C 
made by this mutant is converted to megalomicin A. Alternatively, one could use 
an erythromycin C high -producing strain of S. erythraea in biotransformation 
methods in which the erythromycin C is fed to a Streptomyces lividans strain 

25 carrying only the megosamine biosynthesis and glycosyltransferase genes. As 
another alternative, one could use a strain of S. lividans that carries suitable 
erythromycin production genes along with the daunosamine biosynthesis genes 
plus geneX and geneY of Figure 5, or all of the megosamine biosynthesis genes, to 
produce megalomicin A. 

30 All or some of the megalomicin gene cluster can be easily cloned under 

control of a suitable promoter in pCK7 or pSETl 52 either in one.or two plasmids 
and introduced into the Saccharopolyspora erythraea eryG mutant. The actll- 
ORF4/#67/p system and the phiC3 Mint system in pSET function well in this 
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organism (see Rowe ei aL, 1998, Gene, 216:215-23, incorporated herein by 
reference). Alternatively, the megosamine biosynthesis genes are introduced into 
Streptomyces lividans on the same plasmids and the production of megalomicin A 
or its precursor mediated by byconversion, done by feeding erythronolide B, 3- 
5 alpha-mycarosylerythronolide B, erythromycin D or erythromycin C to the S. 
lividans strain. 

Lack of adequate resistance to megalomicin A in S. eryihraea or S. 
lividans is not expected, because both organisms have MLS resistance genes 
(ermE and mgt/lrm, respectively), which confer resistance to several 14-membered 

10 macrolides (see Cundliffe, 1989, Annu. Rev. Microbiol. 43:207-33; Jenkins and 
Cundliffe, 1991, Gene 705:55-62; and Cundliffe, 1992, Gene, 775:75-84, each of 
which is incorporated herein by reference). One can also readily determine the 
level of resistance of the S. erythraea eryG mutant and the S. lividans host ceils to 
megalomicin A, both in plate tests and in liquid medium. One can repeat the 

15 bioconversion method using an eryG mutant of a high erythromycin A producing 
S. erythraea strain (or an eryB or eryC mutant, as necessary) to determine the level 
at which megalomicin A can be produced. Furthermore, if experience shows that 
high level megalomicin A production requires a higher level of resistance to this 
macrolide than present in S. erythraea or S. lividans, the necessary megalomicin 

20 self-resistance genes will be cloned from M megalomicea and moved into either 
one of the heterologous hosts. This will be straightforward work since self- 
resistance genes are usually found in the cluster of macrolide biosynthesis genes 
and can be identified by their homology to known macrolide resistance genes 
and(or) by the resistance phenotype they impart to a strain that normally is 

25 sensitive. 

Alternatively, geneX and ge/?eF(Figure 5) can be added to cassettes 
containing the relevant daunosamine (dnm) biosynthesis genes (Figure 5) to 
provide the ability to make TDP-megosamine in vivo and attach it to an 
erythromycin algycone. The TDP-daunosamine biosynthesis genes can be re- 
30 cloned from Streptomyces peucetius on two compatible and mutually selectable 
plasmids. When an S. lividans strain containing these two plasmids and the dnmS 
gene for TDP-daunosamine glycosyltransferase is grown in the presence of added 
epsilon-rhodomycinone, its glycoside with L-daunosamine, called rhodomycin D, 
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is produced in good yield. Thus, byconversion of one of the erythromycins to 
megalomicin A should be observed when geneX and geneY are present. One can 
construct all five combination - the two M-di methyl transferase genes and the three 
glycosyltransferase genes - to discriminate geneX and geneY from those connected 
5 with mycarose and desosamine biosynthesis and attachment in the megalomicin 
pathway. 

Because the timing of megosamine addition is unknown, one can test 
erythronolide B, 3-alpha-mycarosylerythronolide B, erythromycin D and 
erythromycin C as substrates provided to a strain that expresses the megosamine 

10 biosynthetic and transferase genes. There is need to test the C3'" and(or) C4'" 
acyiated metabolites like megalomicin CI, because these metabolites are made 
from megalomicin A and not the converse, based on the precedents in the 
biosynthesis of tylosin (see Arisawa et aL- 1994, Appl Environ. Microbiol 60: 
2657-2661), carbomycin (see Epp et aL, 1989, Gene 55:293-301), and 

15 midecamycin (see Hara and Hutchinson, 1992, J. Bacieriol J 74, 5141-5144). If 
C-6 glycosylation of erythronolide B or 3-alpha-mycarosylerythronolide B (Figure 
5) happens before addition of desosamine to C-5, then the erythromycin genes 
might not be able to complete formation of megalomicin A from some mono or 
diglycoside if the erythromycin glycosyltransferases cannot tolerate a C-6 

20 glycoside. Although unexpected, such an outcome could be circumvented in 
accordance with the methods of the invention by cloning further megalomicin 
biosynthesis genes into the appropriate S. erythraea background or into S. lividans 
- specifically, the necessary deoxysugar biosynthesis and attachment genes - to 
create a recombinant strain that produces megalomicin A. 

25 The acyltransferase gene that adds acetate or propionate to the C3'" or 

C4" , positions of mycarose in megalomicin B, CI and C2 (Figure 3) is contained 
within the cosmids of the invention and can be identified by scanning the sequence 
data for the megalomicin gene cluster to locate homologs of car E and mdmB or 
their acyA homologs from the tylosin producer. The carE and acyA genes govern 

30 C4'" acylation in the carbomycin and tylosin pathway, respectively. The 

megalomicin homolog has the equivalent function in megalomicin biosynthesis 
(but is specific for C3' 1 ' and C4" 5 acylation). The gene can be cloned under 
control of a suitable promoter and introduced into S. lividans to produce the 
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desired acyl derivative of megalomicin A. Alternatively, introduction of the carE 
gene can form megalomicin B. This gene can be cloned from the carbomycin, 
spiramycin or tylosin producers. 

If the amount of megalomicin produced by an S. erythraea or S. lividans or 
5 other recombinant host cell is less than desired, yield can be improved by 
optimizing the growth medium and fermentation conditions, by increasing 
expression of the gene(s) that appear to be rate limiting, based on the level of 
pathway intermediates that are accumulated by the recombinant strain constructed, 
and by reconstructing the ery, dnm, and megalomicin biosynthesis genes on 

10 vectors like pSETl 52 that can be integrated into the genome to provide a stabler 
recombinant strain for strain improvement. 

In another embodiment, the present invention provides recombinant 
vectors encoding one or more of the megosamine, desosamine, and mycarose 
biosynthetic and transfer genes and heterologous host cells comprising those 

15 vectors. In this embodiment of the invention, the heterologous host cell is typically 

* 

a cell that is unable to produce the sugar and transfer it to a polyketide unless the 
vector of the invention is introduced. For example, neither Streptomyces lividans 
nor S. coelicolor is naturally capable of making megosamine, desosamine, or 
mycarose or transferring those moieties to a polyketide. However, the present 

20 invention provides recombinant Streptomyces lividans and S. coelicolor host cells 
that are capable of making megosamine, desosamine, and/or mycarose and 
transferring those moieties to a polyketide. 

Moreover, additional recombinant gene products can be expressed in the 
host cell to improve production of a desired polyketide. As but one non-limiting 

25 example, certain of the recombinant PKS proteins of the invention may produce a 
polyketide other than or in addition to the predicted polyketide, because the 
polyketide is cleaved from the PKS by the thioesterase (TE) domain in module 6 
prior to processing by other domains on the PKS, in particular, any KR, DH, 
and/or ER domains in module 6. The production of the predicted polyketide can 

30 be increased in such instances by deleting the TE domain coding sequences from 
the gene and, optionally, expressing the TE domain as a separate protein. See 
Gokhale et aL, Feb. 1999, "Mechanism and specificity of the terminal thioesterase 
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domain from the erythromycin polyketide synthase/' Chem. & Biol 6: 1 17-125, 
incorporated herein by reference. 

Thus, in one important aspect, the present invention provides methods, 
expression vectors, and recombinant host cells that enable the production of 
5 megalomicin and hydroxylated and glycosylated derivatives of megalomicin in 
heterologous host cells. The present invention also provides methods for making a 
wide variety of polyketides derived in part from the megalomicin PKS or other 
biosynthetic genes, as described in the following Section. 

10 Section VI: Hybrid PKS Genes 

The present invention provides recombinant DNA compounds encoding 
each of the domains of each of the modules of the megalomicin PKS as well as the 
other megalomicin biosynthetic enzymes. The availability of these compounds 
permits their use in recombinant procedures for production of desired portions of 

1 5 the megalomicin PKS fused to or expressed in conjunction with all or a portion of 
a heterologous PKS and, optionally, one or more polyketide modification 
enzymes. These compounds also permit the modification of polyketides with the 
various megalomicin modification enzymes. The resulting hybrid PKS can then be 
expressed in a host cell to produce a desired polyketide or modified form thereof. 

20 Thus, in accordance with the methods of the invention, a portion of the 

megalomicin biosynthetic gene coding sequence that encodes a particular activity 
can be isolated and manipulated, for example, to replace the corresponding region 
in a different modular PKS gene or modification enzyme gene. In addition, coding 
sequences for individual proteins, modules, domains, and portions thereof of the 

25 megalomicin PKS can be ligated into suitable expression systems and used to 
produce the portion of the protein encoded. The resulting protein can be isolated 
arid purified or can may be employed in situ to effect polyketide synthesis. 
Depending on the host for the recombinant production of the domain, module, 
protein, or combination of proteins, suitable control sequences such as promoters, 

30 termination sequences, enhancers, and the like are ligated to the nucleotide 

sequence encoding the desired protein in the construction of the expression vector, 
as described above. 
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In one important embodiment, the invention thus provides hybrid PKS 
enzymes and the corresponding recombinant DNA compounds that encode those 
hybrid PKS enzymes. For purposes of the invention, a hybrid PKS is a 
recombinant PKS that comprises all or part of one or more extender modules, 

5 loading module, and/or thioesterase/cyclase domain of a first T>KS and all or part 
of one or more extender modules, loading module, and/or thioesterase/cyclase 
domain of a second PKS. In one preferred embodiment, the first PKS is most but 
not all of the megalomicin PKS, and the second PKS is only a portion of a non- 
rnegalomicin PKS. An illustrative example of such a hybrid PKS includes a 

10 megalomicin PKS in which the megalomicin PKS loading module has been 

replaced with a loading module of another PKS. Another example of such a hybrid 
PKS is a megalomicin PKS in which the AT domain of extender module 3 is 
replaced with an AT domain that binds only malonyl CoA. In another preferred 
embodiment, the first PKS is most but not all of a non-megalomicin PKS, and the 

1 5 second PKS is only a portion of the megalomicin PKS. An illustrative example of 
such a hybrid PKS includes a rapamycin PKS in which an AT specific for malonyl 
CoA is replaced with the AT from the megalomicin PKS specific for 
methylmalonyl CoA. Other illustrative hybrid PKSs of the invention are described 
below. 

20 Those of skill in the art will recognize that all or part of either the first or 

second PKS in a hybrid PKS of the invention need not be isolated from a naturally 
occurring source. For example, only a small portion of an AT domain determines 
its specificity. See PCT patent application No. WO US99/1 5047, and Lau et aL, 
infra, incorporated herein by reference. The state of the art in DNA synthesis 

25 allows the artisan to construct de novo DNA compounds of size sufficient to 
construct a useful portion of a PKS module or domain. Thus, the desired 
derivative coding sequences can be synthesized using standard solid phase 
synthesis methods such as those described by Jaye et al. y 1984, J. Biol Chem. 259: 
6331, and instruments for automated synthesis are available commercially from, 

30 for example, Applied Biosystems, Inc. For purposes of the invention, such 
synthetic DNA compounds are deemed to be a portion of a PKS. 

With this general background regarding hybrid PKSs of the invention, one 
can better appreciate the benefit provided by the DNA compounds of the invention 
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that encode the individual domains, modules, and proteins that comprise the 
megalomicin PKS. As described above, the megalomicin PKS is comprised of a 
loading module, six extender modules composed of a KS, AT, ACP, and zero, 
one, two, or three. KR, DH, and ER domains, and a thioesterase domain. The DNA 
5 compounds of the invention that encode these domains individually or in 
combination are useful in the construction of the hybrid PKS encoding DNA 
compounds of the invention. For example, a DNA compound of the invention that 
encodes an ex tender module or portion of an extender module is useful in the 
construction of a coding sequence that encodes a protein subcomponent of a PKS. 
10 The DNA compound of the invention that comprises a coding sequence of a PKS 
subunit protein is useful in the construction of an expression vector that drives 
expression of the subunit in a host cell that expresses the other subunits and so 
produces a functional PKS. 

The recombinant DNA compounds of the invention that encode the 
1 5 loading module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS 
loading module is inserted into a DNA compound that comprises the coding 
sequence for one or more heterologous PKS extender modules. The resulting 
20 construct, in which the coding sequence for the loading module of the 

heterologous PKS is replaced by that for the coding sequence of the megalomicin 
PKS loading module provides a novel PKS. Examples include the DEBS, 
rapamycin, FK-506, FK-520, rifamycin, and avermectin PKS coding sequences. In 
another embodiment, a DNA compound comprising a sequence that encodes the 
25 megalomicin PKS loading module is inserted into a DNA compound that 
comprises the coding sequence for the megalomicin PKS or a recombinant 
megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion of the loading module coding sequence 
is utilized in conjuction with a heterologous coding sequence. In this embodiment, 
30 the invention provides, for example, replacing the methylmalonyl CoA (propionyl) 
specific AT with a malonyl CoA (acetyl), ethylmalony! CoA (butyryl), or other 
CoA specific AT. In addition, the AT and/or ACP can be replaced by another AT 
and/or another ACP or an inactivated KS, such as a KS°, an AT, and/or another 
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ACP. The resulting heterologous loading module coding sequence can be utilized 
in conjunction with a coding sequence for a PKS that synthesizes megalomicin, a 
rnegalomicin derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the first 
5 extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS first 
extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 
1 0 sequence for a module of the heterologous PKS is either replaced by that for the 
first extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for modules of .the heterologous PKS, provides a novel PKS 
coding sequence. In another embodiment, a DNA compound comprising a 
sequence that encodes the first extender module of the megalomicin PKS is 
1 5 inserted into a DNA compound that comprises coding sequences for the 
megalomicin PKS or a recombinant megalomicin PKS that produces a 
megalomicin derivative. 

In another embodiment, a portion or all of the first extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 
20 hybrid module. In this embodiment, the invention provides, for example, replacing 
the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 
2-hydroxymalonyl CoA specific AT; deleting (which includes inactivating) the 
KR; inserting a DH or a DH and ER; and/or replacing the KR with another KR, a 
DH and KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be 
25 replaced with another KS and/or ACP. In each of these replacements or insertions, 
the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 
from a coding sequence for another module of the megalomicin PKS, from a gene 
for a PKS that produces a polyketide other than megalomicin, or from chemical 
synthesis. The resulting heterologous first extender module coding sequence can 
30 be utilized in conjunction with a coding sequence for a PKS that synthesizes 
megalomicin, a megalomicin derivative, or another polyketide. 

Those of skill in the art will recognize, however, that deletion of the KR 
domain of extender module 1 or insertion of a DH domain or DH and KR domains 
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into extender module 1 will prevent the typical cyclizatiori of the polyketide at the 
hydroxyl group created by the KR if such hybrid module is employed as a first 
extender module in a hybrid PKS or is otherwise involved in producing a portion 
of the polyketide at which cyclization is to occur. Such deletions or insertions can 
5 be useful, however, to create linear molecules or to induce cyclization at another 
site in the molecule. 

As noted above, the invention also provides recombinant PKSs and 
recombinant DNA compounds and vectors that encode such PKSs in which the 
KS domain of the first extender module has been inactivated. Such constructs are 

10 typically expressed in translational reading frame with the first two extender 
modules on a single protein, with the remaining modules and domains of a 
megalomicin, megalomicin derivative, or hybrid PKS expressed as one or more, 
typically two, proteins to form the multi-protein functional PKS. The utility of 
these constructs is that host cells expressing, or cell free extracts containing, the 

15 PKS encoded thereby can be fed or supplied with N-acylcysteamine thioesters of 
precursor molecules to prepare megalomicin derivative compounds. See U.S. 
patent application Serial No. 09/492,733, filed 27 Jan. 2000, and PCT publication 
Nos, WO 00/44717, 99/03986 and 97/02358, each of which is incorporated herein 
by reference. 

20 The recombinant DNA compounds of the invention that encode the second 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS 
second extender module is inserted into a DNA compound that comprises the 

25 coding sequence for a heterologous PKS. The resulting construct, in which the 
coding sequence for a module of the heterologous PKS is either replaced by that 
for the second extender module of the megalomicin PKS or the latter is merely 
added to coding sequences for the modules of the heterologous PKS, provides a 
novel PKS. In another embodiment, a DNA compound comprising a sequence that 

30 encodes the second extender module of the megalomicin PKS is inserted into a 

DNA compound that comprises the coding sequences for the megalomicin PKS or 
a recombinant megalomicin PKS that produces a megalomicin derivative. 
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In another embodiment, a portion or all of the second extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 
. replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
5 CoA, or 2-hydroxymalonyl CoA specific AT; deleting (or inactivating) the KR; 
replacing the KR with a KR, a KR and a DH, or a KR, DH, and ER; and/or 
inserting a DH or a DH and an ER. In addition, the KS and/or ACP can be 
replaced with another KS and/or ACP. In each of these replacements or insertions, 
the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 

10 from a coding sequence for another module of the megalomicin PKS, from a 

coding sequence for a PKS that produces a polyketide other than megalomicin, or 
from chemical synthesis. The resulting heterologous second extender module 
coding sequence can be utilized in conjunction with a coding sequence from a 
PKS that synthesizes megalomicin, a megalomicin derivative, or another 

15 polyketide. 

The recombinant DNA compounds of the invention that encode the third 
extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS third 

20 extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
third extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 

25 In another embodiment, a DNA compound comprising a sequence that encodes 
the third extender module of the megalomicin PKS is inserted into a DNA 
compound that comprises coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the third extender module 

30 coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting the inactive KR; and/or 
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replacing the KR with an active KR, or a ICR and DH, or a KR, DH, and ER. In 
addition, the KS and/or ACP can be replaced with another KS and/or ACP. In 
each of these replacements or insertions, the heterologous KS, AT, DH, KR, ER, 
or ACP coding sequence can originate from a coding sequence for another module 
5 of the megalomicin PKS, from a gene for a PKS that produces a polyketide other 
than megalomicin, or from chemical synthesis. The resulting heterologous third 
extender module coding sequence can be utilized in conjunction with a coding 
sequence for a PKS that synthesizes megalomicin, a megalomicin derivative, or 
another polyketide. 

10 The recombinant DNA compounds of the invention that encode the fourth 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS fourth 
extender module is inserted into a DNA compound that comprises the coding 

15 sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
fourth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 

20 the fourth extender module of the megalomicin PKS is inserted into a DNA 
compound that comprises coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion of the fourth extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 

25 hybrid module. In this embodiment, the invention provides, for example, replacing 
the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyi CoA, or 
2-hydroxymalonyI CoA specific AT; deleting or inactivating any one, two, or all 
three of the ER, DH, and KR; and/or replacing any one, two, or all three of the ER, 
DH, and KR with either a KR, a DH and KR, or a KR, DH, and ER. In addition, 

30 the KS and/or ACP can be replaced with another KS and/or ACP. In each of these 
replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding 
sequence can originate from a coding sequence for another module of the 
megalomicin PKS (except for the DH and ER domains), from a coding sequence 
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for a PKS that produces a polyketide other than megalomicin, or from chemical 
synthesis. The resulting heterologous fourth extender module coding sequence can 
be utilized in conjunction with a coding sequence for a PKS that synthesizes 
megalomicin, a megalomicin derivative, or another polyketide. 
5 The recombinant DN A compounds of the invention that encode the fifth 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS fifth 
extender module is inserted into a DNA compound that comprises the coding 
10 sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
fifth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 
15 the fifth extender module of the megalomicin PKS is inserted into a DNA 

compound that comprises the coding sequence for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the fifth extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 
20 create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting (or inactivating) the KR; 
inserting a DH or a DH and ER; and/or replacing the KR with another KR, a DH 
. and KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be replaced 
25 with another KS and/or ACP. In each of these replacements or insertions, the 

heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a 
coding sequence for another module of the megalomicin PKS, from a coding 
sequence for a PKS that produces a polyketide other than megalomicin, or from 
chemical synthesis. The resulting heterologous fifth extender module coding 
30 sequence can be utilized in conjunction with a coding sequence for a PKS that 
synthesizes megalomicin, a megalomicin derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the sixth 
extender module of the megalomicin PKS and the corresponding polypeptides 
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encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS sixth 
extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 
5 sequence for a module of the heterologous PKS is either replaced by that for the 
sixth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 
the sixth extender module of the megalomicin PKS is inserted into a DNA 

1 0 compound that comprises the coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the sixth extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 

15 replacing the methylmalonyi CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting or inactivating the KR or 
replacing the KR with another KR, a KR and DH, or a KR, DH, and an ER; and/or 
inserting a DH or a DH and ER. In addition, the KS and/or ACP can be replaced 
with another KS and/or ACP. In each of these replacements or insertions, the 

20 heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a 
coding sequence for another module of the megalomicin PKS, from a coding 
sequence for a PKS that produces a polyketide other than megalomicin, or from 
chemical synthesis. The resulting heterologous sixth extender module coding 
sequence can be utilized in conjunction with a coding sequence for a PKS that 

25 synthesizes megalomicin, a megalomicin derivative, or another polyketide. 

The sixth extender module of the megalomicin PKS is followed by a 
thioesterase domain. This domain is important in the cyclization of the polyketide 
and its cleavage from the PKS. The present invention provides recombinant DNA 
compounds that encode hybrid PKS enzymes in which the megalomicin PKS is 

30 fused to a heterologous thioesterase or a heterologous PKS is fused to the 

megalomicin PKS thioesterase. Thus, for example, a thioesterase domain coding 
sequence from another PKS gene can be inserted at the end of the sixth (or other 
final) extender module coding sequence in recombinant DNA compounds of the 

58 



WO 01/27284 



PCT/US00/27433 



invention or the megalomicin PKS thioesterase can be similarly fused to a 
heterologous PKS. Recombinant DNA compounds encoding this thioesterase 
domain are useful in constructing DNA compounds that encode the megalomicin 
PKS, a PKS that produces a megalomicin derivative, and a PKS that produces a 
5 polyketide other than megalomicin or a megalomicin derivative. 

Thus, the hybrid modules of the invention are incorporated into a PKS to 
provide a hybrid PKS of the invention. A hybrid PKS of the invention can result 
not only: 

(i) from fusions of heterologous domain (where heterologous means the 
1 0 domains in a module are derived from at least two different naturally occurring 

modules) coding sequences to produce a hybrid module coding sequence 
contained in a PKS gene whose product is incorporated into a PKS, 
but also: 

(ii) from fusions of heterologous modules (where heterologous module 
1 5 means two modules are adjacent to one another that are not adjacent to one 

another in naturally occurring PKS enzymes) coding sequences to produce a 
hybrid coding sequence contained in a PKS gene whose product is incorporated 
into a PKS, 

(iii) from expression of one or more megalomicin PKS genes with one or 
20 more non-megalomicin PKS genes, including both naturally occurring and 

recombinant non-megalomicin PKS genes, and 

(iv) from combinations of the foregoing. 

Various hybrid PKSs of the invention illustrating these various alternatives are 
described herein. 

25 An example of a hybrid PKS comprising fused modules results from 

fusion of the loading module of either the DEBS PKS or the narbonolide PKS (see 
PCT patent application No. US99/1 1814, incorporated herein by reference) with 
extender modules 1 and 2 of the megalomicin PKS to produce a hybrid megAI 
gene. Co-expression of either one of these two hybrid megAI genes with the 

30 megAII and meg/4/// genes in suitable host cells, such as Streptomcyes lividans, 
results in expression of a hybrid PKS of the invention that produces 6- 
deoxyerythronolide B (the polyketide product of the natural megA genes) in 
recombinant host cells. Co-expression of either one of these two hybrid megAI 
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genes with the eryAII and eryA III genes similarly results in the production of 6- 
dEB, while co-expression with the analogous narbonolide PKS genes, picAII, 
picAIII and picAIV, results in the production of 3-deoxy-3-oxo-6-dEB (3-keto-6- 
dEB), useful in the production of ketolides, compounds with potent anti-bacterial 
5 activity. 

Another example of a hybrid PKS comprising a hybrid module is prepared 
by co-expressing the megAI and megAII genes with a megAHI hybrid gene 
encoding extender module 5 and the KS and AT of extender module 6 of the 
megalomicin PKS fused to the ACP of module 6 and the TE of the narbonolide 
10 PKS. The resulting hybrid PKS of the invention produces 3-keto-6-dEB. This 

compound can also be prepared by a recombinant megalomicin derivative PKS of 
the invention in which the KR domain of module 6 of the megalomicin PKS has 
been deleted. Moreover, the invention provides hybrid PKSs in which not only the 
above changes have been made but also the AT domain of module 6 has been 

1 5 replaced with a malonyl-specific AT. These hybrid PKSs produce 2-desmethyl-3- 
deoxy-3-oxo-6-dEB, a useful intermediate in the preparation of 2-desmethyl 
ketolides, compounds with potent antibiotic activity. 

Another illustrative example of a hybrid PKS includes the hybrid PKS of 
the invention resulting only from the latter change in the hybrid PKS just 

20 described. Thus, co-expression of the megAI and megAII genes with a hybrid 
megAIII gene in which the AT domain of module 6 has been replaced by a 
malonyl-specific AT results in the expression of a hybrid PKS that produces 2- 
desmethyl-6-dEB in recombinant host cells. This compound is a useful 
intermediate for making 2-desmethyl erythromycins in recombinant host cells of 

25 the invention, as well as for making 2-desmethyl semi-synthetic ketolides. 

While many of the hybrid PKSs described above are composed primarily 
of megalomicin PKS proteins, those of skill in the art recognize that the present 
invention provides many different hybrid PKSs, including those composed of only 
a small portion of the megalomicin PKS. For example, the present invention 

30 provides a hybrid PKS in which a hybrid eryAI gene that encodes the megalomicin 
PKS loading module fused to extender modules 1 and 2 of DEBS is coexpressed 
with the eryAII and eryAIIl genes. The resulting hybrid PKS produces 6-dEB, the 
product of the native DEBS. When the construct is expressed in 
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Saccharopolyspora erythraea host cells (either via chromosomal integration in the 
chromosome or via a vector that encodes the hybrid PKS), the resulting 
recombinant host cell of the invention produces erythromycins. Another 
illustrative example is the hybrid PKS of the invention composed of the megAI 
5 and eryAII and eryAIJl gene products. This construct is also useful in expressing 
erythromycins in Saccharopolyspora erythraea host cells. In a preferred 
embodiment, the S. erythraea host cells are eryAJ mutants that do not produce 6- 
deoxyerythronolide B. 

Another example is the hybrid PKS of the invention composed of the 
1 0 products of the picAI and picAII genes (the two proteins that comprise the loading 
module and extender modules 1 - 4, inclusive, of the narbonolide PKS) and the 
megAIII gene. The resulting hybrid PKS produces the macrolide aglycone 3- 
hydroxy-narbonolide in Streptomyces lividans host cells and the corresponding 
erythromycins in Saccharopolyspora erythraea host cells. 
1 5 Each of the foregoing hybrid PKS enzymes of the invention, and the hybrid 

PKS enzymes of the invention generally, can be expressed in a host cell that also 
expresses a functional oleP gene product. The oleP gene encodes an oleandomycin 
modification enzyme, and expression of the gene together with a hybrid PKS of 
the invention provides the compounds of the invention in which a C-8 hydroxyl, a 
20 C-8a or C-8-C-8a epoxide is present. 

Recombinant methods for manipulating modular PKS genes to make 
hybrid PKS enzymes are described in U.S. Patent Nos. 5,672,491 ; 5,843,718; 
5,830,750; and 5,712,146; and in PCT publication Nos. 98/493 1 5 and 97/02358, 
each of which is incorporated herein by reference. A number of genetic 
25 engineering strategies have been used with DEBS to demonstrate that the 

structures of polyketides can be manipulated to produce novel natural products, 
primarily analogs of the erythromycins (see the patent publications referenced 
supra and Hutchinson, 1998, Curr Opin Microbiol 7:319-329, and Baltz, 1998, 
Trends Microbiol. (5:76-83, incorporated herein by reference). Because of the 
30 similar activity of the megalomicin PKS and DEBS (both PKS enzymes produce 
the macrolide aglycone 6-dEB), these methods can be readily applied to the 
recombinant megalomicin PKS genes of the invention. 
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These techniques include: (i) deletion or insertion of modules to control 
chain length, (ii) inactivation of reduction/dehydration domains to bypass beta- 
carbon processing steps, (iii) substitution of AT domains to alter starter and 
extender units, (iv) addition of reduction/dehydration domains to introduce 
5 catalytic activities, and (v) substitution of ketoreductase KR domains to control 
hydroxyl stereochemistry. In addition, engineered blocked mutants of DEBS have 
been used for precursor directed biosynthesis of analogs that incorporate 
synthetically derived starter units. For example, more than 100 novel polyketides 
were produced by engineering single and combinatorial changes in multiple 

10 modules of DEBS. Hybrid PKS enzymes based on DEBS with up to three catalytic 
domain substitutions were constructed by cassette mutagenesis, in which various 
DEBS domains were replaced with domains from the rapamycin PKS (see 
Schweke etaL, 1995, Proa Nat. Acad. Sci. USA 92, 7839-7843, incorporated 
herein by reference) or one more of the DEBS KR domains was deleted. 

1 5 Functional single domain replacements or deletions were combined to generate 
DEBS enzymes with double and triple catalytic domain substitutions (see 
McDaniel et al. y 1999, Proc. Nat. Acad ScL USA 96, 1846-1851, incorporated 
herein by reference). By providing the analogous megalomicin/rapamycin hybrid 
PKS enzymes, the present invention provides alternative means to make these 

20 polyketides. 

Methods for generating libraries of polyketides have been greatly improved 
by cloning PKS genes as a set of three or more mutually selectable plasmids, each 
carrying a different wild-type or mutant PKS gene, then introducing all possible 
combinations of the plasmids with wild-type, mutant, and hybrid PKS coding 

25 sequences into the same host (see U.S. patent application Serial No. 60/129,731, 
filed 16 Apr. 1999, and PCT Pub. No. 98/27203, each of which is incorporated 
herein by reference). This method can also incorporate the use of a KS1° mutant, 
which by mutational biosynthesis can produce polyketides made from diketide 
starter units (see Jacobsen e( al., 1997, Science 217, 367-369, incorporated herein 

30 by reference), as well as the use of a truncated gene that leads to 12-membered 
macrolides or an elongated gene that leads to 16-membered ketolides. Moreover, 
by utilizing in addition one or more vectors that encode glycosyl biosynthesis and 
transfer genes, such as those of the present invention for megosamine, 
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desosamine, oleandrose, cladinose, and/or mycarose (in any combination), a large 
collection of glycosylated polyketides can be prepared. 

The following Table lists references describing illustrative PKS genes and 
corresponding enzymes that can be utilized in the construction of the recombinant 
5 hybrid PKSs and the corresponding DNA compounds that encode them of the 

invention. Also presented are various references describing tailoring enzymes and 
corresponding genes that can be employed in accordance with the methods of the 
invention. 
Avermectin 
10 U.S. Pat. No. 5,252,474 to Merck. 

MacNeil et al, 1993, Industrial Microorganisms: Basic and Applied 
Molecular Genetics , Baltz, Hegeman, & Skatrud, eds. (ASM), pp. 245-256, A 
Comparison of the Genes Encoding the Polyketide Synthases for Avermectin, 
Erythromycin, and Nemadectin. 
15 MacNeil etal, 1992, Gene 1 15: 11 9- 125, Complex Organization of the 

Streptomyces avermitilis genes encoding the avermectin polyketide synthase. 
Candicidin (FR008) 

Hu etaL, 1994, Mol Microbiol 14: 163-172. 

Epothilone 

20 PCT Pub. No. 00/03 1247 to Kosan. 

Erythromycin 

PCT Pub. No. 93/13663 to Abbott. 
US Pat. No. 5,824,513 to Abbott. 
Donadio et aL y 1 99 1 , Science 252:675-9. 
25 Cortes et oL y 8 Nov. 1990, Nature 348:176-8, An unusually large 

multifunctional polypeptide in the erythromycin producing polyketide synthase of 
Saccharopolyspora erythraea. 
Glvcosvlation Enzymes 
PCT Pub. No. 97/23630 to Abbott. 
30 FK-506 

Motamedi et aL, 1998, The biosynthetic gene cluster for the macrolactone 
ring of the immunosuppressant FK506, Eur. J. biochem. 256: 528-534. 
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Motamedi et aL, 1997, Structural organization of a multifunctional 
polyketide synthase involved in the biosynthesis of the macrolide 
immunosuppressant FK506, Eur. J. Biochem. 244: 74-80. 

Methyl transferase 

5 US 5,264,355, issued 23 Nov. 1993, Methylating enzyme from 

Streptomyces MA6858. 3 lO-desmethyl-FK506 methyltransferase. 

Motamedi et aL, 1996, Characterization of methyltransferase and 
hydroxylase genes involved in the biosynthesis of the immunosuppressants FK506 
and FK520, J. Bacterial. 178: 5243-5248. 
10 FK-520 

PCT Pub. No. 00/2060 1 to Kosan. 

See also Nielsen et aL, 1991, Biochem. 30:5789-96 (enzymology of 
pipecolate incorporation). 
Lovastatin 

15 U.S. Pat. No. 5,744,350 to Merck. 

Narbomycin (and Picromycin) 

PCT Pub. No. WO US99/61599 to Kosan. 
Ncmadectin 

MacNeil et aL, 1993, supra. 
20 Niddamycin 

Kakavas et aL, 1997, Identification and characterization of the niddamycin 
polyketide synthase genes from Streptomyces caelestis, J. BacterioL 1 79: 75 1 5- 
7522. 

Oleandomycin 

25 Swan et aL, 1994, Characterization of a Streptomyces antibioticus gene 

. encoding a type I polyketide synthase which has an unusual coding sequence, Mol 
Gen. Genet. 242: 358-362. 

PCT Pub. No. 00/026349 to Kosan. 

Olano et aL, 1998, Analysis of a Streptomyces antibioticus chromosomal 
30 region involved in oleandomycin biosynthesis, which encodes two 

glycosyltransferases responsible for glycosylation of the macrolactone ring, Mol 

Gen. Genet. 259(3): 299-308. 

Platenolide 
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EP Pub. No. 791,656 to Lilly. 
Rapamycin 

Schwecke et al. y Aug. 1995, The biosynthetic gene cluster for the 
polyketide rapamycin, Proc. Nail, Acad ScL USA 92:7839-7843. 
5 Aparicio et aL, 1996, Organization of the biosynthetic gene cluster for 

rapamycin in Streptomyces hygroscopicus: analysis of the enzymatic domains in 
the modular polyketide synthase, Gene J 69: 9-16. 
Rifamycin 

August et al. 9 13 Feb. 1998, Biosynthesis of the ansamycin antibiotic 
10 rifamycin: deductions from the molecular analysis of the rij biosynthetic gene 
cluster of Amycolatopsis mediterranei S669, Chemistry & Biology, 5(2): 69-79. 
Soraphen 

U.S. Pat. No. 5,716,849 to Novartis. 

Schupp et al., 1995, J. Bacteriology 177: 3673-3679. A Sorangium 
1 5 cellulosum (Mycobacterium) Gene Cluster for the Biosynthesis of the Macrolide 
Antibiotic Soraphen A: Cloning, Characterization, and Homology to Polyketide 
Synthase Genes from Actinomycetes. 
Spiramycin 

U.S. Pat. No. 5,098,837 to Lilly. 
20 Activator Gene 

U.S. Pat. No. 5,514,544 to Lilly. 
Tylosin 

EP Pub. No. 791,655 to Lilly. 

Kuhstoss et ai, 1996, Gene 753:231-6., Production of a novel polyketide 

25 through the construction of a hybrid polyketide synthase. 

U.S. Pat. No. 5,876,991 to Lilly. 
Tailoring enzymes 

Merson-Davies and Cundliffe, 1994, Mol Microbiol 13: 349-355. 
Analysis of five tylosin biosynthetic genes from the tylBA region of the 
30 Streptomyces fradiae genome. 

As the above Table illustrates, there are a wide variety of PKS genes that serve as 
readily available sources of DNA and sequence information for use in constructing 
the hybrid PKS-encoding DNA compounds of the invention. 

65 



01272S4A2 I > 



WO 01/27284 PCT/US00/27433 

i 

In constructing hybrid PKSs of the invention, certain general methods may 
be helpful. For example, it is often beneficial to retain the framework of the 
module to be altered to make the hybrid PKS. Thus, if one desires to add DH and 
ER functionalities to a module, it is often preferred to replace the KR domain of 
5 the original module with a cognate KR, DH, and ER domain-containing segment 
from another module, instead of merely inserting DH and ER domains. One can 
alter the stereochemical specificity of a module by replacement of the KS domain 
with a KS domain from a module that specifies a different stereochemistry. See 
Lau et al. 9 1999, "Dissecting the role of acyltransferase domains of modular 

10 polyketide synthases in the choice and stereochemical fate of extender units" 

Biochemistry 38(5): 1643- 1651, incorporated herein by reference. One can alter the 
specificity of an AT domain by changing only a small segment of the domain. See 
Lau et aL, supra. One can also take advantage of known linker regions in PKS 
proteins to link modules from two different PKSs to create a hybrid PKS. See 

15 Gokhale et aL, 16 Apr. 1999, Dissecting and Exploiting Intermodular 

Communication in Polyketide Synthases", Science 284: 482-485, incorporated 
herein by reference. 

The hybrid PKS-encoding DN A compounds of the invention can be and 
often are hybrids oTmore than two PKS genes. Even where only two genes are 

20 used, there are often two or more modules in the hybrid gene in which all or part 
of the module is derived from a second (or third) PKS gene. Thus, as one 
illustrative example, the invention provides a hybrid PKS that contains the 
naturally occurring loading module and thioesterase domain as well as extender 
modules one, two, four, and six of the megalomicin PKS and further contains 

25 hybrid or heterologous extender modules three and five. Hybrid or heterologous 
extender modules three and five contain AT domains specific for malonyl CoA 
and derived from, for example, the rapamycin PKS genes. 

The invention also provides libraries of PKS genes, PKS proteins, and 
ultimately, of polyketides, that are constructed by generating modifications in the 

30 megalomicin PKS so that the protein complexes produced have altered activities 
in one or more respects and thus produce polyketides other than the natural 
product of the PKS. Novel polyketides may thus be prepared, or polyketides in 
general prepared more readily, using this method. By providing a large number of 
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different genes or gene clusters derived from a naturally occurring PKS gene 
cluster, each of which has been modified in a different way from the native cluster, 
an effectively combinatorial library of polyketides can be produced as a result of 
the multiple variations in these activities. As will be further described below, the 
5 metes and bounds of this embodiment of the invention can be described on the 
polyketide, protein, and the encoding nucleotide sequence levels. 

As described above, a modular PKS "derived from" the megalomicin or 
other naturally occurring PKS includes a modular PKS (or its corresponding 
encoding gene(s)) that retains the scaffolding of the utilized portion of the 
1 0 naturally occurring gene. Mot all modules need be included in the constructs; 

however, the constructs can also comprise more than six modules. On the constant 
scaffold, at least one enzymatic activity is mutated, deleted, replaced, or inserted 
so as to alter the activity of the resulting PKS relative to the original (native) PKS. 
Alteration results when these activities are deleted or are replaced by a different 
15 version of the activity, or simply mutated in such a way that a polyketide other 
than the natural product results from these collective activities. This occurs 
because there has been a resulting alteration of the starter unit and/or extender 
unit, stereochemistry, chain length or cyclization, and/or reductive or dehydration 
cycle outcome at a corresponding position in the product polyketide. Where a 
20 deleted activity is replaced, the origin of the replacement activity may come from a 
corresponding activity in a different naturally occurring PKS or from a different 
region of the megalomicin PKS. Any or all of the megalomicin PKS genes may be 
included in the derivative or portions of any of these may be included, but the 
scaffolding of a functional PKS protein is retained in whatever derivative is 
25 constructed. The derivative preferably contains a thioesterase activity from the 
megalomicin or another PKS. 

Thus, a PKS derived from the megalomicin PKS includes a PKS that 
contains the scaffolding of all or a portion of the megalomicin PKS. The derived 
PKS also contains at least two extender modules that are functional, preferably 
30 three extender modules, and more preferably four or more extender modules, and 
most preferably six extender modules. The derived PKS also contains mutations, 
deletions, insertions, or replacements of one or more of the activities of the 
functional modules of the megalomicin PKS so that the nature of the resulting 
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polyketide is altered at both the protein and DNA sequence levels. Particular 
preferred embodiments include those wherein a KS, AT, or ACP domain has been 
deleted or replaced by a version of the activity from a different PICS or from 
another location within the same PKS. Also preferred are derivatives where at 
5 least one non-condensation cycle enzymatic activity (KR, DH, or ER) has been 
deleted or added or wherein any of these activities has been mutated so as to 
change the structure of the polyketide synthesized by the PKS. 

Conversely, also included within the definition of a PKS derived from the 
megalomicin PKS are functional non-megalomicin PKS modules or their 

10 encoding genes wherein at least one domain or coding sequence therefor of a 
megalomicin PKS module has been inserted. Exemplary is the use of the 
megalomicin AT for extender module 2, which accepts a methylmalonyl CoA 
extender unit rather than malonyi CoA, to replace a malonyl specific AT in 
another PKS. Other examples include insertion of portions of non-condensation 

1 5 cycle enzymatic activities or other regions of megalomicin synthase activity into a 
heterologous PKS at both the DNA and protein levels. 

Thus, there are at least five degrees of freedom for constructing a hybrid 
PKS in terms of the polyketide that will be produced. First, the polyketide chain 
length is determined by the number of extender modules in the PKS, and the 

20 present invention includes hybrid PKSs that contain 6, as wells as fewer or more 
than 6, extender modules. Second, the nature of the carbon skeleton of the PKS is 
determined by the specificities of the acyl transferases that determine the nature of 
the extender units at each position, e.g., malonyl, methylmalonyl, ethylmalonyl, or 
other substituted malonyl. Third, the loading module specificity also has an effect 

25 on the resulting carbon skeleton of the polyketide. The loading module may use a 
different starter unit, such as acetyl, butyryl, and the like. As noted above, another 
method for varying loading module specificity involves inactivating the KS 
activity in extender module 1 (KS1) and providing alternative substrates, called 
diketides, that are chemically synthesized analogs of extender module I diketide 

30 products, for extender module 2. This approach was illustrated in PCT publication 
Nos. 97/02358 and 99/03986, incorporated herein by reference, wherein the KS1 
activity was inactivated through mutation. Fourth, the oxidation state at various 
positions of the polyketide will be determined by the dehydratase and reductase 
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portions of the modules. This will determine the presence and location of ketone 
and alcohol moieties and C-C double bonds or C-C single bonds in the polyketide. 

Finally, the stereochemistry of the resulting polyketide is a function of 
three aspects of the synthase. The first aspect is related to the AT/KS specificity 
5 associated with substituted malonyls as extender units, which affects 

stereochemistry only when the reductive cycle is missing or when it contains only 
a ketoreductase, as the dehydratase would abolish chirality. Second, the specificity 
of the ketoreductase may determine the chirality of any beta-OH. Finally, the 
enoylreductase specificity for substituted malonyls as extender units may influence 
10 the stereochemistry when there is a complete KR7DH/ER available. 

Thus, the modular PKS systems generally and the megalomicin PKS 
system particularly permit a wide range of polyketides to -be synthesized. As 
compared to the aromatic PKS systems, the modular PKS systems accept a wider 
range of starter units, including aliphatic monomers (acetyl, propionyl, butyryl, 
1 5 isovaleryl, and the like.), aromatics (aminohydroxybenzoyl), alicyclics 

(cyclohexanoyl), and heterocyclics (thiazolyl). Certain modular PKSs have relaxed 
specificity for their starter units (Kao et aL, 1994, Science, supra). Modular PKSs 
also exhibit considerable variety with regard to the choice of extender units in 
each condensation cycle. The degree of beta-ketoreduction following a 
20 condensation reaction can be altered by genetic manipulation (Donadio et ai 9 

1991 , Science, supra; Donadio et al. 9 1993, Proc. Natl. Acad. Sci. USA 90: 7119- 
7123). Likewise, the size of the polyketide product can be varied by designing 
mutants with the appropriate number of modules (Kao et aL, 1994, J. Am. Chem. 
Soc. 77(5:1 1612-1 1613). Lastly, modular PKS enzymes are particularly well 
25 known for generating an impressive range of asymmetric centers in their products 
in a highly controlled manner. The polyketides, antibiotics, and other compounds 
produced by the methods of the invention are typically single stereoisomeric 
forms. Although the compounds of the invention can occur as mixtures of 
stereoisomers, it may be beneficial in some instances to generate individual 
30 stereoisomers. Thus, the combinatorial potential within modular PKS pathways 
based on any naturally occurring modular, such as the megalomicin, PKS scaffold 
is virtually unlimited. 
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While hybrid PKSs are most often produced by "mixing and matching" 
portions of PKS coding sequences, mutations in DNA encoding a PKS can also be 
used to introduce, alter, or delete an activity in the encoded polypeptide. Mutations 
can be made to the native sequences using conventional techniques. The substrates 
5 for mutation can be an entire cluster oT genes or only one or two of them; the 
substrate for mutation may also be portions of one or more of these genes. 
Techniques for mutation include preparing synthetic oligonucleotides including 
the mutations and inserting the mutated sequence into the gene encoding a PKS 
subunit using restriction endonuclease digestion. See, e.g., Kunkel, 1985, Proc. 

10 Natl. Acad. ScL USA 82: 448; Geisselsoder et al, 1987 Biofechniques 5:786. 

Alternatively, the mutations can be effected using a mismatched primer (generally 
10-20 nucleotides in length) that hybridizes to the native nucleotide sequence, at a 
temperature below the melting temperature of the mismatched duplex. The primer 
can be made specific by keeping primer length and base composition within 

15 relatively narrow limits and by keeping the mutant base centrally located. See 

Zoller and Smith, 1983, Methods EnzymoL 700:468. Primer extension is effected 
using DNA polymerase, the product cloned, and clones containing the mutated 
DNA, derived by segregation of the primer extended strand, selected. 
Identification can be accomplished using the mutant primer as a hybridization 

20 probe. The technique is also applicable for generating multiple point mutations. 
See, e.g., Dalbie-McFariand et ai y 1982, Proc. Natl. Acad. Sci. USA 79: 6409. 
PCR mutagenesis can also be used to effect the desired mutations. 

Random mutagenesis of selected portions of the nucleotide sequences 
encoding enzymatic activities can also be accomplished by several different 

25 techniques known in the art, e.g., by inserting an oligonucleotide linker randomly 
into a plasmid, by irradiation with X-rays or ultraviolet light, by incorporating 
incorrect nucleotides during in vitro DNA synthesis, by error-prone PCR 
mutagenesis, by preparing synthetic mutants, or by damaging plasmid DNA in 
vitro with chemicals, in accordance with the methods of the present invention. 

30 Chemical mutagens include, for example, sodium bisulfite, nitrous acid, 

nitrosoguanidine, hydroxylamine, agents which damage or remove bases thereby 
preventing normal base-pairing such as hydrazine or formic acid, analogues of 
nucleotide precursors such as 5-bromouracil, 2-aminopurine, or acridine 

70 

BNSDOCIO: <WO 0127284A^I_> 



WO 01/27284 



PCT/US00/27433 



intercalating agents such as proflavine, acriflavine, quinacrine, and the like. 
Generally, plasmid DNA or DNA fragments are treated with chemical mutagens, 
transformed into E. colt and propagated as a pool or library of mutant plasmids. 

In constructing a hybrid PKS of the invention, regions encoding enzymatic 
5 activity, i.e., regions encoding corresponding activities from different PKS 
synthases or from different locations in the same PKS, can be recovered, for 
example, using PCR techniques with appropriate primers. By "corresponding" 
activity encoding regions is meant those regions encoding the same general type of 
activity. For example, a KR activity encoded at one location of a gene cluster 
10 "corresponds" to a KR encoding activity in another location in the gene cluster or 
in a different gene cluster. Similarly, a complete reductase cycle could be 
considered corresponding. For example, KR/DH/ER can correspond to a KR 
alone. 

If replacement of a particular target region in a host PKS is to be made, 

15 this replacement can be conducted in vitro using suitable restriction enzymes. The 
replacement can also be effected in vivo using recombinant techniques involving 
homologous sequences framing the replacement gene in a donor plasmid and a 
receptor region in a recipient plasmid. Such systems, advantageously involving 
plasmids of differing temperature sensitivities are described, for example, in PCX 

20 publication No. WO 96/40968, incorporated herein by reference. The vectors used 
to perform the various operations to replace the enzymatic activity in the host PKS 
genes or to support mutations in these regions of the host PKS genes can be 
chosen to contain control sequences operably linked to the resulting coding 
sequences in a manner such that expression of the coding sequences can be 

25 effected in an appropriate host. 

However, simple cloning vectors may be used as well. If the cloning 
vectors employed to obtain PKS genes encoding derived PKS lack control 
sequences for expression operably linked to the encoding nucleotide sequences, 
the nucleotide sequences are inserted into appropriate expression vectors. This 

30 need not be done individually, but a pool of isolated encoding nucleotide 
sequences can be inserted into expression vectors, the resulting vectors 
transformed or transfected into host cells, and the resulting cells plated out into 
individual colonies. The invention provides a variety of recombinant DNA 
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compounds in which the various coding sequences for the domains and modules 
of the megalomicin PKS are flanked by non-naturally occurring restriction enzyme 
recognition sites. 

The various PKS nucleotide sequences can be cloned into one or more 
5 recombinant vectors as individual cassettes, with separate control elements, or 
under the control of, e.g., a single promoter. The PKS subunit encoding regions 
can include flanking restriction sites to allow for the easy deletion and insertion of 
other PKS subunit encoding sequences so that hybrid PKSs can be generated. The 
design of such unique restriction sites is known to those of skill in the art and can 

10 be accomplished using the techniques described above, such as site-directed 
mutagenesis and PCR. 

The expression vectors containing nucleotide sequences encoding a variety 
of PKS enzymes for the production of different polyketides are then transformed 
into the appropriate host cells to construct the library. In one straightforward 

15 approach, a mixture of such vectors is transformed into the selected host cells and 
the resulting cells plated into individual colonies and selected to identify 
successful transformants. Each individual colony has the ability to produce a 
particular PKS synthase and ultimately a particular polyketide. Typically, there 
will be duplications in some, most, or all of the colonies; the subset of the 

20 transformed colonies that contains a different PKS in each member colony can be 
considered the library. Alternatively, the expression vectors can be used 
individually to transform hosts, which transformed hosts are then assembled into a 
library. A variety of strategies are available to obtain a multiplicity of colonies 
each containing a PKS gene cluster derived from the naturally occurring host gene 

25 cluster so that each colony in the library produces a different PKS and ultimately a 
different polyketide. The number of different polyketides that are produced by the 
library is typically at least four, more typically at least ten, and preferably at least 
20, and more preferably at least 50, reflecting similar numbers of different altered 
PKS gene clusters and PKS gene products. The number of members in the library 

30 is arbitrarily chosen; however, the degrees of freedom outlined above with respect 
to the variation of starter, extender units, stereochemistry, oxidation state, and 
chain length enables the production of quite large libraries. 
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Methods for introducing the recombinant vectors of the invention into 
suitable hosts are known to those of skill in the art and typically include the use of 
CaCh or agents such as other divalent cations, lipofection, DMSO, protoplast 
transformation, conjugation, infection, transfection, and electroporation. The 

5 polyketide producing colonies can be identified and isolated using known 

techniques and the produced polyketides further characterized. The polyketides 
produced by these colonies can be used collectively in a panel to represent a 
library or may be assessed individually for activity. 

The libraries of the invention can thus be considered at four levels: (1) a 

10 multiplicity of colonies each with a different PKS encoding sequence; (2) the 

proteins produced from the coding sequences; (3) the polyketides produced from 
the proteins assembled into a functional PKS; and (4) antibiotics or compounds 
with other desired activities derived from the polyketides. Of course, combination 
libraries can also be constructed wherein members of a library derived, for 

1 5 example, from the megalomicin PKS can be considered as a part of the same 
library as those derived from, for example, the rapamycin PKS or DEBS. 

Colonies in the library are induced to produce the relevant synthases and 
thus to produce the relevant polyketides to obtain a library of polyketides. The 
polyketides secreted into the media can be screened for binding to desired targets, 

20 such as receptors, signaling proteins, and the like. The supernatants per se can be 
used for screening, or partial or complete purification of the polyketides can first 
be effected. Typically, such screening methods involve detecting the binding of 
each member of the library to receptor or other target ligand. Binding can be 
detected either directly or through a competition assay. Means to screen such 

25 libraries for binding are well known in the art and can be applied in accordance 
with the methods of the present invention. Alternatively, individual polyketide 
members of the library can be tested against a desired target. In this event, screens 
wherein the biological response of the target is measured can more readily be 
included. Antibiotic activity can be verified using typical screening assays such as 

30 those set forth in Lehrer et ai, 1991, J. Immunol. Meth. 137: 167-1 73, incorporated 
herein by reference, and in the Examples below. 

The invention provides methods for the preparation of a large number of 
polyketides. These polyketides are useful intermediates in formation of 
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compounds with antibiotic or other activity through hydroxylation, epoxidation, 

* 

and glycosylation reactions as described above. In general, the polyketide products 
of the PKS must be further modified, typically by hydroxylation and glycosylation, 
to exhibit potent antibiotic activity. Hydroxylation results in the novel polyketides 

• 5 of the invention that contain hydroxyl groups at C-6, which can be accomplished 
using the hydroxylase encoded by the eryF gene, and/or C-12, which can be 
accomplished using the hydroxylase encoded by the picK or eryK gene. Also, the 
oleP gene is available in recombinant form, which can be used to express the oleP 
gene product in any host cell. A host cell, such as a Streptomyces host cell or a 
10 Saccharopolyspora erythraea host cell, modified to express the oleP gene thus can 
be used to produce polyketides comprising the C-8-C-8a epoxide present in 
oleandomycin. Thus the invention provides such modified polyketides. The 
presence of hydroxyl groups at these positions can enhance the antibiotic activity 
of the resulting compound relative to its unhydroxylated counterpart. 
1 5 Methods for glycosylating polyketides are generally known in the art and 

can be applied in accordance with the methods of the present invention; the 
glycosylation may be effected intracellular^ by providing the appropriate 
glycosylation enzymes or may be effected in vitro using chemical synthetic means 
as described herein and in PCT publication No. WO 98/49315, incorporated 

.20 herein by reference. Preferably, glycosylation with desosamine, mycarose, and/or 
megosamine is effected in accordance with the methods of the invention in 
recombinant host cells provided by the invention. In general, the approaches to 
effecting glycosylation mirror those described above with respect to 
hydroxylation. The purified enzymes, isolated from native sources or 

25 recombinant^ produced may be used in vitro. Alternatively and as noted, 

glycosylation may be effected intracellular^ using endogenous or recombinantly 
produced intracellular glycosylases. In addition, synthetic chemical methods may 
be employed. 

The antibiotic modular polyketides may contain any of a number of 
30 different sugars, although D-desosamine, or a close analog thereof, is most 

common. Erythromycin, picromycin, megalomicin, narbomycin, and methymycin 
contain desosamine. Erythromycin also contains L-cladinose (3-O-methyl 
mycarose). Tylosin contains mycaminose (4-hydroxy desosamine), mycarose and 
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6-deoxy-D-allose. 2-acetyl-l-bromodesosamine has been used as a donor to 
glycosylate polyketides by Masamune et al. 9 1975, J. Am. Chem. Soc. 97: 3512- 
3513. Other, apparently more stable donors include glycosyl fluorides, 
thioglycosides, and trichloroacetimidates; see Woodward et al. t 1981, J. Am. 
5 Chem. Soc. 103: 3215; Martin et aL, 1997, J, Am. Chem. Soc. 119: 3 193; Toshima 
et ai t 1995, J. Am. Chem. Soc. 11 7: 3717; Matsumoto et ai, 1988, Tetrahedron 
Lett. 29: 3575. Glycosylation can also be effected using the polyketide aglycones 
as starting materials and using Saccharopolyspora erythraea or Streptomyces 
venezuelae or other host cell to make the conversion, preferably using mutants 
10 unable to synthesize macrolides, as discussed in the preceding Section. 

Thus, a wide variety of polyketides can be produced by the hybrid PKS 
enzymes of the invention. These polyketides are useful as* antibiotics and as 
intermediates in the synthesis of other useful compounds, as described in the 
following section. 

15 

Section VII: Host Cells Containing Multiple Expression Vectors 

A recombinant host cell of the invention may contain nucleic acid 
encoding a megalomicin PKS domain, module, or protein, or megalomicin 
modification enzyme at a single genetic locus, e.g., on a single plasmid or at a 

20 single chromosomal locus, or at different genetic loci, e.g., on separate plasmids 
and/or chromosomal loci. By "multiple" is meant two or more; by "vector" is 
meant a nucleic acid molecule which can be used to transform host systems and 
which contains an independent expression system containing a coding sequence 
under control of a promoter and optionally a selectable marker and any other 

25 suitable sequences regulating expression. Typical such vectors are plasmids, but 
other vectors such as phagemids, cosmids, viral vectors and the like can be used 
according to the nature of the host. Of course, one or more of the separate vectors 
may integrate into the chromosome of the host (selection may not be required for 
maintenance of integrated vectors). 

30 In one embodiment, the invention provides a recombinant host cell, which 

comprises at least two separate autonomously replicating recombinant DNA 
expression vectors, each of said vectors comprises a recombinant DNA compound 
encoding a megalomicin PKS domain or a megalomicin modification enzyme 
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operably linked to a promoter. In another embodiment, the invention provides a 
recombinant host cell, which comprises at least one autonomously replicating 
recombinant DNA expression vector and at least one modified chromosome, each 
of said vector(s) and each of said modified chromosome comprises a recombinant 
5 DNA compound encoding a megalomicin PKS domain or a megalomicin 

modification enzyme operably linked to a promoter. Preferably, the autonomously 
replicating recombinant DNA expression vector and/or the modified chromosome 
further comprises distinct selectable markers. 

The above multiple- vector (chromosome) expression systems can also be 

10 used for expressing heterogeneous polyketide biosynthetic enzymes, e.g., for 

expressing Micromonospora megalomicea megalomicin PKS protein, module, or 
domain or a megalomicin modification enzyme with a PKS protein, module, or 
domain, or modification enzyme from other origins in the same host cells. By 
placing various activities on different expression vectors, a high degree of 

1 5 variation can be achieved in an efficient manner. A variety of hosts can be used; 
any suitable host cell that can maintain multiple vectors can readily be used. 
Preferred hosts include Streptomyces, yeast, E. coli, other actinomycetes, and plant 
cells, and mammalian or insect cells or other suitable recombinant hosts can also 
be used. Preferred among yeast strains are Saccharomyces cerevisiae and Pichia 

20 pastoris. Preferred actinomycetes include various strains of Streptomyces. 

If one chooses to use a host cell that does not naturally produce a 
polyketide, then one may need to ensure that the recombinant host is modified to 
also contain a hoio ACP synthase activity that effects pantetheinylation of the acyl 
carrier protein. See PCT Pub. No. WO 97/13845, incorporated herein by 

25 reference. One of the multiple vectors may be used for this purpose. This 

activation step is necessary for activation of the ACP. The expression system for 
the holo ACP synthase may be supplied on a vector separate from that carrying a 
PKS coding sequence or may be supplied on the same vector or may be integrated 
into the chromosome of the host, or may be supplied as an expression system for a 

30 fusion protein with all or a portion of a polyketide synthase (see U.S. Patent No. 
6,033,883, incorporated herein by reference). 

It should be noted that in some recombinant hosts, it may also be necessary 
to activate the polyketides produced through postsynthesis modifications when 
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polyketides haying such modifications are desired. If this is the case for a 
particular host, the host will be modified, for example by transformation, to 
contain those enzymes necessary for effecting these modifications. Among such 
enzymes, for example, are glycosylation enzymes. The use of multiple vectors can 
5 facilitate the introduction of expression systems for such enzymes. 

In a preferred embodiment, the multiple vector system is used to assemble 
rapidly and efficiently a combinatorial library of polyketides and the 
PKS/modification enzymes that produce them. In an illustrative embodiment, the 
multiple vector system comprises four different vectors, one comprising the megAI 
1 0 gene, one the megAJI gene, one the megAHI gene, and one the modification 

enzyme(s) gene(s). Each of these vectors can be modified to make a set of vectors. 
For example, one set could contain all possible AT substitutions in the loading and 
first and second extender modules of the megAI gene product. Another set could 
contain expression systems for a variety of different modification enzymes. With 
1 5 these four vectors sets and by combining each member of each set with each 
member of the other three sets, a very large library of cells, vector sets, and 
polyketides can be rapidly and efficiently assembled. 

The combinatorial potential of a modular PKS such as the megalomicin 
PKS (ignoring the additional potential of different modification enzyme systems) 
20 is minimally given by: AT L X (AT E X 4) M where AT L is the number of loading 
acyl transferases, ATe is the number of extender acyl transferases, and M is the 
number of modules in the gene cluster. The number 4 is present in the formula 
because this represents the number of ways a keto group can be modified by either 
1) no reaction; 2) KR activity alone; 3) KR+DH activity; or 4) KR+DH+ER 
25 activity. It has been shown that expression of only the first two modules of the 
erythromycin PKS resulted in the production of a predicted truncated triketide 
product (See Kaoetal., J. Am. Chem. Soc, J_16:l 1612-1 1613 ((1994)). A novel 
12-membered macrolide similar to methymycin aglycone was produced by 
expression of modules 1-5 of this PKS in S. coelicolor (See Kao et al., J. Am. 
30 Chem. Soc. , JJ/7 :9 1 05-9 1 06 ( 1 995)). This work shows that PKS modules are 

functionally independent so that lactone ring size can be controlled by the number 
of modules present. 
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In addition to controlling the number of modules, the modules can be 
genetically modified, for example, by the deletion of a ketoreductase domain as 
described by Donadio et al., Science, 252:67 '5-679 (1991); and Donadio et a)., 
Gene, 115:97-103 (1992). In addition, the mutation of an enoyl reductase domain 
5 was reported by Donadio, et aL, Proc. Natl Acad. Sci. , 90:71 19-7123 (1 993). 

These modifications also resulted in modified PKS and thus modified polyketides. 

As stated above, in the present invention, the coding sequences for 
catalytic activities derived from the megalomicin PKS systems found in nature can 
be used in their native forms or modified by standard mutagenesis techniques to 
1 0 delete or diminish activity or to introduce an activity into a module in which it was 
not originally present. For example, a KR activity can be introduced into a 
module normally lacking that function. 

In one embodiment of the invention herein, a single host cell is modified to 
contain a multiplicity of vectors, each vector contributing a portion of the 
15 synthesis of a megalomicin PKS and modification enzyme (if any) system. Each 
of the multiple vectors for production of the megalomicin PKS system typically 
encodes at least two modules, and at least one of the vectors integrates into the 
chromosome of the host. Integration can be effected using suitable phage or 
integrating vectors or by homologous recombination. If homologous 
20 recombination is used, the integration event may also be designed to delete 
endogenous PKS genes residing in the chromosome, as described in the PCT 
application WO 95/08548. In these embodiments, too, a selectable marker such as 
hygromycin or thiostrepton resistance can be included in the vector that effects 
integration. 

25 As mentioned above, additional enzymes that effect post-translational 

modifications to the enzyme systems in the megalomicin PKS may be introduced 
into the host through suitable recombinant expression systems. In addition, 
enzymes that activate the polyketides themselves, for example, through 
glycosylation may be added. It may also be desirable to modify the cell to produce 

30 more of a particular substrate utilized in polyketide biosynthesis. For example, it 
is generally believed that malonyl CoA levels in yeast are higher than 
methylmalonyl CoA; if yeast is chosen as a host, it may be desirable to increase 
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methylmalonyl CoA levels by the addition of one or more biosynthetic enzymes 
therefor. 

The multiple-vector expression system can also be used to make 
polyketides produced by the addition of synthetic starter units to a PKS that 

5 contains an inactivated ketosynthase (KS) in the first module. As noted above, 
this modification permits the system to incorporate a suitable diketide thioester 
such as 3-hydroxy-2-methyl pantonoic acid-N-acetyl cysteamine thioester, or 
similar thioesters of diketide analogs, as described by Jacobsen et al., Science, 
277 :367-369 (1997). The construction of PKS modules containing inactivated 

10 ketosynthase regions can be conducted by methods known in the art, such as the 
method described in U.S. Patent No. 6,080,555 and PCT publication Nos. WO 
99/03986 and 97/02358, each of which is incorporated herein by reference, in 
accordance with the methods of the present invention. 

The multiple-vector expression system can be used to produce polyketides 

1 5 in hosts that normally do not produce them, such as E. coli and yeast. It also 
provides more efficient means to provide a variety of polyketide products by 
supplying the elements of the introduced PKS, whether in an E. coli or yeast host 
or in other more traditionally used hosts, such as Streptomyces. The invention 
also includes libraries of polyketides prepared using the methods of the invention. 

20 

Section VIII: Compounds 

The methods and recombinant DNA compounds of the invention are useful 
in the production of polyketides. In one important aspect, the invention provides 
methods for making antibiotic compounds related in structure to erythromycin, a 

25 potent antibiotic compound. The invention also provides novel ketolide 

compounds, polyketide compounds with potent antibiotic activity of significant 
interest due to activity against antibiotic resistant strains of bacteria. See 
Griesgraber et al., 1996, J. Antibiot. 49: 465 -477 ', incorporated herein by 
reference. Most if not all of the ketolides prepared to date are synthesized using 

30 erythromycin A. a derivative of 6-dEB, as an intermediate. In one embodiment, 
the present invention provides the 3-keto derivatives of the megaiomicins for use 
as antibiotics. In particular, the 3-keto derivative of megalomicin A is a preferred 
ketolide of the invention. These compounds can be made chemically, substantially 

79 

BNSOOCI D: <*/VC 0 1 27234 A2 I > 



WO 01/27284 PCT/US00/27433 

in accordance with the procedures for making ketolides described in the prior art, 
or in recombinant host cells of the invention in which the megosamine and 
desosamine biosynthetic and transferase genes are present but which do not make 
or transfer the mycarose moiety and/or the PKS has been modified to delete the 
5 KR domain of extender module 6. The invention also provides methods for 

making intermediates useful in preparing traditional, 6-dEB- and erythromycin- 
derived ketolide compounds. See'Griesgraber et aL, supra; Agouridas et aL, 1998, 
J. Med Chem. 41: 4080-4100, U.S. Patent Nos. 5,770,579; 5,760,233; 5,750,510; 
5,747,467; 5,747,466; 5,656,607; 5,635,485; 5,614,614; 5,556,118; 5,543,400; 

10 5,527,780; 5,444,051; 5,439,890; 5,439,889; and PCT publication Nos. WO 
98/09978 and 98/28316, each of which is incorporated herein by reference. 

As noted above, the hybrid PKS genes of the invention can be expressed in 
a host cell that contains the desosamine, megosamine, and/or mycarose 
biosynthetic genes and corresponding transferase genes as well as the required 

1 5 hydroxylase gene(s), which may. for example and without limitation, be either 
picK, megK, or eryK (for the C-12 position) and/or megF overyF (for the C-6 
position). The resulting compounds have antibiotic activity but can be further 
modified, as described in the patent publications referenced above, to yield a 
desired compound with improved or otherwise desired properties. Alternatively, 

20 the aglycone compounds can be produced in the recombinant host cell, and the 

desired glycosylation and hydroxylation steps carried out in vitro or in vivo, in the 
latter case by supplying the converting cell with the aglycone, as described above. 

The compounds of the invention are thus optionally glycosylated forms of 
the polyketide set forth in formula (1) below which are hydroxylated at either the 

25 C-6 or the C-12 or both. The compounds of formula (1) can be prepared using the 
loading and the six extender modules of a modular PKS, modified or prepared in 
hybrid form as herein described. These polyketides have the formula: 
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(2) 



including the glycosylated and isolated stereoisomeric forms thereof; 

wherein R* is a straight chain, branched or cyclic, saturated or unsaturated 
substituted or unsubstituted hydrocarbyl of 1 -1 5C; 
5 each of R'-R 6 is independently H or alkyl (1-4C) wherein any alkyl at R 1 

may optionally be substituted; 

each of X*-X 5 is independently two H, H and OH, or =0; or 

each of X'-X 5 is independently H and the compound of formula (2) 
contains a double-bond in the ring adjacent to the position of said X at 2-3, 4-5, 6- 
10 7, 8-9 and/or 10-11; 

with the proviso that: 

at least two of R'-R 6 are alkyl (1-4C). 

Preferred compounds comprising formula 2 are those wherein at least three 
of R'-R 5 are alkyl (1-4C), preferably methyl or ethyl; more preferably wherein at 

1 5 least four of R ! -R 5 are alkyl (1 -4C), preferably methyl or ethyl. Also preferred are 
those wherein X 2 is two H, =0, or H and OH, and/or X 3 is H, and/or X 1 is OH 
and/or X 4 is OH and/or X 5 is OH. Also preferred are compounds with variable R* 
when R*-R s is methyl, X 2 is =0, and X 1 , X 4 and X 5 are OH. The glycosylated 
forms (i.e., mycarose or cladinose at C-3, desosamine at C-5, and/or megosamine 

20 at C-6) of the foregoing are also preferred. 

As described above, there are a wide variety of diverse organisms that can 
modify compounds such as those described herein to provide compounds with or 
that can be readily modified to have useful activities. For example, 
Saccharopolyspora erythraea can convert 6-dEB to a variety of useful 
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compounds. The compounds provided by the present invention can be provided to 
cultures of Saccharopolyspora erythraea and converted to the corresponding 
derivatives of erythromycins A, B, C, and D in accordance with the procedure 
provided in the Examples, below. To ensure that only the desired compound is 
5 produced, one can use an S. erythraea eryA mutant that is unable to produce 6- 
dEB but can still carry out the desired conversions (Weber el al. y 1985, J. 
BacterioL 164(1): 425-433). Also, one can employ other mutant strains, such as 
eryB, eryC, eryG, and/or eryK mutants, or mutant strains having mutations in 
multiple genes, to accumulate a preferred compound. The conversion can also be 

1 0 carried out in large fermentors for commercial production. Each of the 

erythromycins A, B, C, and D has antibiotic activity, although erythromycin A has 
the highest antibiotic activity. Moreover, each of these compounds can form, 
under treatment with mild acid, a C-6 to C-9 hemiketal with rnotilide activity. For 
formation of hemiketals with rnotilide activity, erythromycins B, C, and D, are 

15 preferred, as the presence of a C-12 hydroxyl allows the formation of an inactive 
compound that has a hemiketal formed between C-9 and C-12. 

Thus, the present invention provides the compounds produced by 
hydroxylation and glycosylation of the compounds of the invention by action of 
the enzymes endogenous to Saccharopolyspora erythraea and mutant strains of S. 

20 erythraea. Such compounds are useful as antibiotics or as motilides directly or 
after chemical modi fication. For use as antibiotics, the compounds of the 
invention can be used directly without further chemical modification. 
Erythromycins A, B, C, and D all have antibiotic activity, and the corresponding 
compounds of the invention that result from the compounds being modified by 

25 Saccharopolyspora erythraea also have antibiotic activity. These compounds can 
be chemically modified, however, to provide other compounds of the invention 
with potent antibiotic activity. For example, alkylation of erythromycin at the C-6 
hydroxyl can be used to produce potent antibiotics (clarithromycin is C-6-O- 
methyl), and other useful modifications are described in, for example, Griesgraber 

30 et al. 9 1996, J. Antihiot. 49: 465-477, Agouridas et al., 1998, J. Med Chem. 41: 
4080-4100, U.S. Patent Nos. 5,770,579; 5,760,233; 5,750,510; 5,747,467; 
5,747,466; 5,656,607; 5,635,485; 5,614,614; 5,556,118; 5,543,400; 5,527,780; 
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5,444,05 1 ; 5,439,890; and 5,439,889; and PCT publication Nos. WO 98/09978 
and 98/28316, each of which is incorporated herein by reference. 

For use as motilides, the compounds of the invention can be used directly 
without further chemical modification. Erythromycin and certain erythromycin 
5 analogs are potent agonists of the motilin receptor that can be used clinically as 
prokinetic agents to induce phase III of migrating motor complexes, to increase 
esophageal peristalsis and LES pressure in patients with GERD, to accelerate 
gastric emptying in patients with gastric paresis, and to stimulate gall bladder 
contractions in patients after gallstone removal and in diabetics with autonomic 

10 neuropathy. See Peeters, 1999, Motilide Web Site, http://www.med. kuleuven. 
ac.be/med/gih/motilid.htm, and Omura et al., 1987, Macrolides with 
gastrointestinal motor stimulating activity, J. Med. Chem. 30: 1941-3). The 
corresponding compounds of the invention that result from the compounds of the 
invention being. modified by Saccharopolyspora erythraea also have motilide 

1 5 activity, particularly after conversion, which can also occur in vivo, to the C-6 to 
C-9 herniketal by treatment with mild acid. Compounds lacking the C-12 hydroxyl 
are especially preferred for use as motilin agonists. These compounds can also be 
further chemically modified, however, to provide other compounds of the 
invention with potent motilide activity. 

20 Moreover, and also as noted above, there are other useful organisms that 

can be employed to hydroxylate and/or glycosylate the compounds of the 
invention. As described above, the organisms can be mutants unable to produce 
the polyketide normally produced in that organism, the fermentation can be carried 
out on plates or in large fermentors, and the compounds produced can be 

25 chemically altered after fermentation. In addition to Saccharopolyspora erythraea, 
Streptomyces venezuelae, S. narbonensis, S. antibioticus, Micromonospora 
megalomicea, S.fradiae, and S. thermotolerans can also be used. In addition to 
antibiotic activity, compounds of the invention produced by treatment with M. 
megalomicea enzymes can have antiparasitic activity as well. Thus, the present 

30 invention provides the compounds produced by hydroxylation and glycosylation 
by action of the enzymes endogenous to S. erythraea, S. venezuelae, S. 
narbonensis, S. antibioticus, M megalomicea, S.fradiae, and S. thermotolerans. 
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The present invention also provides methods and genetic constructs for 
producing the glycosylated and/or hydroxylated compounds of the invention 
directly in the host cell of interest. Thus, the recombinant genes of the invention, 
which include recombinant megAl y megAII, and megAIJl genes with one or more 
5 deletions and/or insertions, including replacements of a megA gene fragment with 
a gene fragment from a heterologous PKS gene, can be included on expression 
vectors suitable for expression of the encoded gene products in 
Saccharopolyspora erythraea, Micromonospora megalomicea, S. venezuelae, S. 
narbonensis, S. antibioticus, S. fradiae, and S. thermotolerans. 

1 0 The compounds of the invention can be produced by growing and 

fermenting the host cells of the invention under conditions known in the art for the 
production of other polyketides. The compounds of the invention can be isolated 
from the fermentation broths of these cultured cells and purified by standard 
procedures. The compounds can be readily formulated to provide the 

15 pharmaceutical compositions of the invention. The pharmaceutical compositions 
of the invention can be used in the form of a pharmaceutical preparation, for 
example, in solid, semisolid, or liquid form. This preparation will contain one or 
more of the compounds of the invention as an active ingredient in admixture with 
an organic or inorganic carrier or excipient suitable for external, enteral, or 

20 parenteral application. The active ingredient may be compounded, for example, 
with the usual non-toxic, pharmaceutically acceptable carriers for tablets, pellets, 
capsules, suppositories, solutions, emulsions, suspensions, and any other form 
suitable for use. 

The carriers which can be used include water, glucose, lactose, gum acacia, 
25 gelatin, mannitol, starch paste, magnesium trisilicate, talc, corn starch, keratin, 
colloidal silica, potato starch, urea, and other carriers suitable for use in 
manufacturing preparations, in solid, semi-solid, or liquified form. In addition, 
auxiliary stabilizing, thickening, and coloring agents and perfumes may be used. 
For example, the compounds of the invention may be utilized with hydroxypropyl 
30 methylcellulose essentially as described in U.S. Patent No. 4,916,138, 

incorporated herein by reference, or with a surfactant essentially as described in 
EPO patent publication No. 428,169, incorporated herein by reference. 

84 

BMSOOCID: <WO 912^4/12 ) > 



WO 01/27284 PCT/US00/27433 

Oral dosage forms may be prepared essentially as described by Hondo et 
aL, 1987, Transplantation Proceedings XIX, Supp. 6: 17-22, incorporated herein 
by reference. Dosage forms for external application may be prepared essentially as 
described in EPO patent publication No. 423,714, incorporated herein by 
5 reference. The active compound is included in the pharmaceutical composition in 
an amount sufficient to produce the desired effect upon the disease process or 
condition. 

For the treatment of conditions and diseases caused by infection, a 
compound of the invention may be administered orally, topically, parenterally, by 

10 inhalation spray, or rectally in dosage unit formulations containing conventional 
non-toxic pharmaceutical^ acceptable carriers, adjuvant, and vehicles. The term 
parenteral, as used herein, includes subcutaneous injections, and intravenous, 
intramuscular, and intrasternal injection or infusion techniques. 

Dosage levels of the compounds of the invention are of the order from 

15 about 0.0 1 mg to about 50 mg per kilogram of body weight per day, preferably 
from about 0.1 mg to about 10 mg per kilogram of body weight per day. The 
dosage levels are useful in the treatment of the above-indicated conditions (from 
about 0.7 mg to about 3.5 mg per patient per day, assuming a 70 kg patient). In 
addition, the compounds of the invention may be administered on an intermittent 

20 basis, i.e., at semi-weekly, weekly, semi-monthly, or monthly intervals. 

The amount of active ingredient that may be combined with the carrier 
materials to produce a single dosage form will vary depending upon the host 
treated and the particular mode of administration. For example, a formulation 
intended for oral administration to humans may contain from 0.5 mg to 5 gm of 

25 active agent compounded with an appropriate and convenient amount of carrier 
material, which may vary from about 5 percent to about 95 percent of the total 
composition. Dosage unit forms will generally contain from about 0.5 mg to about 
500 mg of active ingredient. For external administration, the compounds of the 
invention may be formulated within the range of, for example, 0.00001% to 60% 

30 by weight, preferably from 0.001% to 10% by weight, and most preferably from 
about 0.005% to 0.8% by weight. 

It will be understood, however, that the specific dose level for any 
particular patient will depend on a variety of factors. These factors include the 
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activity of the specific compound employed; the age, body weight, general health, 
sex, and diet of the subject; the time and route of administration and the rate of 
excretion of the drug; whether a drug combination is employed in the treatment; 
and the severity of the particular disease or condition for which therapy is sought. 
5 A detailed description of the invention having been provided above, the 

following examples are given for the purpose of illustrating the invention and shall 
not be construed as being a limitation on the scope of the invention or claims. 

Example 1 

10 Cloning and Characterization of the Megalomicin Biosvnthetic Gene Cluster from 

Micromonospora meglomicea 
Experimental Procedures 

Bacterial Strains, Media, and Growth Conditions 

Routine DNA manipulations were performed in Escherichia coli XL1 Blue 

15 or E. coli XL1 Blue MR (Stratagene) using standard culture conditions (Sambrook 
et ai, 1989). M. megalomicea subs, nigra NRRL3275 was obtained from the 
ATCC collection and cultured according to recommended protocols. For isolation 
of genomic DNA, M megalomicea was grown in TSB (Hopwood et ai, 1985) at 
30 °C. S. lividans K4-1 14 (Ziermann and Betlach, 1999), which carries a deletion 

20 of the actinorhodin biosynthetic gene cluster, was used as the host for expression 
of the megAI-AIII genes. S. lividans strains were maintained on R5 agar at 30°C 
and grown in liquid YEME for preparation of protoplasts (Hopwood et ai, 1985) . 
S, erythraea NRRL2338 was used for expression of the megosamine genes. S, 
erythraea strains were maintained on R5 agar at 34°C and grown in liquid TSB for 

25 preparation of protoplasts. 



Manipulation of DNA and Organisms 

Manipulation and transformation of DNA in E. coli was performed by 
standard procedures (Sambrook et ai, 1989) or by suppliers protocols. Protoplasts 
of S lividans and S. erythraea were generated for transformation by plasmid DNA 
using the standard procedure. S. lividans transformants were selected on R5 using 
2 ml of a 0.5 mg/ml thiostrepton overlay. S. erythraea transformants were selected 
on R5 using 1.5 ml of a 0.6 mg/ml apramycin overlay. 
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Isolation of the meg gene cluster 

A cosmid library was prepared in SuperCos (Stratagene) from M 
megalomicea total DNA partially digested with Sau3A I, and introduced into £. 

5 coli using a Gigapack HI XL (Stratagene) in-vitro packaging kit. 32 P-labelled DNA 
probes encompassing the KS2 domain from ery DEBS, or a mixture of segments 
encompassing modules 1 and 2 from ery DEBS were used separately to screen the 
cosmid library by colony hybridization. Several colonies which hybridized with 
the probes were further analyzed by sequencing the ends of their cosmid inserts 

1 0 using T3 and T7 primers. BLAST (Altschul et ai, 1990) analysis of the sequences 
revealed several colonies with DNA sequences highly homologous to genes from 
the ery cluster. Together with restriction analysis, this led" to the isolation of two 
overlapping cosmids, pKOS079-93 A and pKOS079-93D which covered -45 kb of 
the meg cluster. A 400 bp PCR fragment was generated from the left end of and 

1 5 pKOS079-93D and used to reprobe the cosmid library. Likewise, a 200 bp PCR 
fragment generated from the right end of pKOS079-93A was used to reprobe the 
cosmid library. Analysis of hybridizing colonies as described above resulted in 
identification of two additional cosmids, P KOS079-138B and pKLOS79-124B 
which overlap the previous two cosmids. BLAST analysis of the far left and right 

20 end sequences of these cosmids indicated no homology to any known genes 
related to polyketide biosynthesis and therefore indicates that the set of four 
cosmids spans the entire megalomicin biosynthetic gene cluster. 

DNA sequencing and analysis 

25 PCR-based double stranded DNA sequencing was performed on a 

Beckman CEQ 2000 capillary sequencer using reagents and protocols provided by 
the manufacturer. A shotgun library of the entire cosmid pKOS079-93D insert was 
made as follows: DNA was first digested with Dra 1 to eliminate the vector 
fragment, then partially digested with SaujA I. After agarose electrophoresis, 

30 bands between 1-3 kb were excised from the gel and ligated with BamH I digested 
pUC19. Another shotgun library was generated from a 12 kb Xho MEcoK 1 
fragment subcloned from cosmid pKOS079-93A to extend the sequence to the 
megFgene. A 4 kb Bgl 11/ Xho 1 fragment from cosmid pKOS079-l38B was 
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sequenced by primer walking to extend the sequencing to the meg7gene. 
Sequence was assembled using Sequencher (Gene Codes Corp.) software package 
and analyzed with Mac Vector (Oxford Molecular Group) and the NCBI BLAST 
server (www.ncbi.nlm.nih.gov/BLAST/). 

5 

Plasmids 

Plasmid pKOS108-6 is a modified version of pKA0127'kan' (Ziermann 
and Betlach, 1999; Ziermann and Betlach, 2000) in which the eryAI-III genes 
between the Pac I and EcoR I sites have been replaced with the meg AI-III genes. 
10 This was done by first substituting a synthetic nucleotide DNA duplex (5 s - 

TAAGAATTCGGAGATCTGGCCTCAGCTCTAGAC (SEQ ID NO: 21), 
complementary oligo 5'- 

AATTGTCTAGAGCTGAGGCCAGATCTCCGAATTCTTAAT (SEQ ID NO: 
22)) between the Pac I and EcoR I sites of the pKAO!27'kan' vector fragment. 

1 5 The 22 kb EcoR l/Bgl II fragment from cosmid pKOS079-93D containing the 

megA I-II genes was inserted into EcoR l and Bgl II sites of the resulting plasmid to 
generate pKOS024-84. A 12 kb Bgl U/BbvC I fragment containing the megAlll 
and part of the megCll gene was subcloned from pKOS079-93A and excised as a 
Bgl WXba I fragment and ligated into the corresponding sites of pKOS024-84 to 

20 yield the final expression plasmid pKOS 108-06. 

The megosamine integrating vector, pKOS97-42, was constructed as 
follows: A subclone was generated containing the 4 kb Xho MSca I fragment from 
pKOS79-138B together with the 1 .7 kb Sea MPst I fragment from pKOS79-93D in 
Litmus 28 (Stratagene). The entire 5.7 kb fragment was then excised as a Spe UPst 
25 I fragment and combined with the 6.3 kb Pst VEcoR I fragment from KOS79-93D 
and EcoR VXba I digested pSET152 (Bierman et al, 1992) to construct plasmid 
pKOS97-42. 

Production and analysis of secondary metabolites 
30 Fermentation for production of polyketide, LC/MS analysis, and 

quantification of 6-dEB for S. lividans K4-1 14/pKOS 108-6 and S. lividans K4- 
1 14/pKAOl27'kan' were essentially as previously described (Xue et ai, 1999). S. 
erythraea NRRL2338 and S. erythraea/pKOS97-42 were grown for 6 days in Fi 
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media (Brunker et al., 1998). Samples of broth were clarified in a microcentrifuge 
(5 min, 13,000 rpm). For LC/MS preparation, isopropanol was added to the 
supernatant (1 :2 ratio) and centrifuged again. Erythromycins and megalomicins 
were detected by electrospray mass spectrometry and quantity was determined by 
5 evaporative light scattering detection (ELSD). The LC retention time and mass 
spectra of erythromycin and megalomicins were identical to known standards. 

Nucleotide sequence of the meg gene cluster 

A series of 4 overlapping inserts containing the meg cluster (Figure 9) were 

10 isolated from a cosmid library prepared from total genomic DNA of M. 

megalomicea and covers > 100 kb of the genome. A contiguous 48 kb segment 
which encodes the megalomicin PKS and several deoxysugar biosynthetic genes 
was sequenced and analyzed. The segment contains 17 complete ORFs as well as 
an incomplete ORF at each end, organized as shown in Figure 9. 

1 5 PKS genes. The ORFs megAJ, megA II and megAIII encode the polyketide 

synthase responsible for synthesis of 6-dEB. The enzyme complex, meg DEBS, is 
highly similar to ery DEBS, with each of the three predicted polypeptides sharing 
an average of 83% overall similarity with their ery PKS counterpart. Both PKSs 
are composed of 6 modules (2 modules per polypeptide) and each module is 

20 organized in the identical manner (Figure 9). A dendrogram analysis (Schwecke et 
al, 1995) employing 70 acyltranferase (AT) domains revealed that the 6 meg 
extender AT domains cluster with AT domains that incorporate methylmalonyl 
CoA (not shown). The loading module of meg DEBS also lacks a KS Q domain 
which is utilized by most macrolide PKSs for decarboxylation of the starter unit to 

25 initiate polyketide synthesis (Bisang et al, 1999; Kuhstoss et al, 1 996; Kakavas et 
al, 1997; Xue et al., 1998), implying that priming begins with a propionate unit. 
In addition, a conserved Gly to Pro substitution in the NADPH-binding region of 
the ketoreductase (KR) domain of module 3 is observed in meg DEBS, which has 
been proposed to account for its inactivity in ery DEBS (Donadio et al, 1991). 

30 Deoxysugar genes. BLAST (Altschul et al, 1990) analysis of the genes 

flanking the PKS indicated that 12 complete ORFs and 1 partial ORF appear to 
encode functions required for synthesis of one of the three megalomicin 
deoxysugars. Assignment of each ORF to a specific deoxysugar pathway was 
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made based on comparison to the ery genes and other related genes involved in 
deoxysugar biosynthesis (Table 2). 
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5 a. Determined by BLASTX analysis using default parameters. 
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Three ORFs, megBV, megCIIl and megDI, encode glycosyltransferases, 
apparently one for attachment of each deoxysugar to the macrolide. MegBV was 
most similar to EryBV, the erythromycin mycarosyltransferase, and hence was 
assigned to the mycarose pathway in the meg cluster. The closest match for both of 
5 the remaining glycosyltransferases was EryCIII, the desosaminyltransferase in 
erythromycin biosynthesis. Given the higher degree of similarity between EryCIII 
and MegCIIf (Table 2), MegCIIl was designated the desosaminyltransferase, 
leaving MegDI as the proposed megosaminyltransferase. In similar fashion, 
assignments were made accordingly for; MegCII and MegDVI, two putative 3,4- 
10 isomerases similar to EryCII; MegBII and MegDVII, 2,3-reductases homologous 
to EryBII; MegBIV and MegDV, putative 4-ketoreductases similar to EryBIV 
(Table 2). The remaining ORFs involved in deoxysugar brosynthesis, megT, 
megDII, megDIII and megDIV, each encode a putative 2,3 -dehydratase, 
aminotransferase, dimethyl transferase and 3,5-epirnerase, respectively (Table 2). 
15 Since both the megosamine and desosamine pathways require an aminotransferase 
and a dimethyltransferase, and since mycarose and megosamine each require a 
2,3-dehydratase and a 3,5-epimerase, assignments of these four genes to a specific 
pathway could not be made on the basis of sequence comparison alone. However, 
the latter three are implicated in megosamine biosynthesis by experiments 
20 described below. 

Other genes. Two additional complete ORFs, designated megY and megH 
and an incomplete ORF, designated megF, were also identified in the cluster. 
MegH and MegF share high degrees of similarity with EryH and EryF. EryH and 
homologs in other macrolide gene clusters are thioesterase-like proteins with 
25 unknown function in polyketide gene clusters (Haydock et al, 1 99 1 ; Xue et al , 

1998; Butler et al, 1999; Tang et al, 1999). EryF encodes the erythronolide B C-6 
hydroxylase (Figure 8) (Weber et al, 1991 ; Andersen and Hutchinson, 1992). 
MegY does not have an ery counterpart but appears to belong to a (small) family 
of O-acyltransferases that transfer short acyl chains to macrolides. Two classes 
30 exist: AcyA and MdmB transfer acetyl or propionyl groups to the C-3 hydroxyls 
on 16-membered macrolide rings (Arisawa et al, 1994; Hara and Hutchinson, 
1992); CarE and Mpt transfer isovalerate or propionate to the mycarosyl moiety of 
carbomycin and midecamycin, respectively (Epp et al, 1989; Arisawa et al, 1993; 
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Gu et al. 7 1996). The structures of various megalomicins suggest that MegY 
belongs to the latter class and is the acyltransferase which converts megalomicin A 
to megalomicins B, CI , or C2 (verified experimentally below). 

5 Heterologous expression of the meg PKS genes. 

The wild type and genetically modified versions of the ery DEBS have 
been used extensively in heterologous Streptomyces hosts for enzyme studies and 
the production of novel polyketide compounds. Given the similarities between the 
ery and meg DEBSs, production characteristics were compared in a commonly 

10 used Streptomyces host strain. The three meg A ORFs were cloned into the 

expression plasmid pKA0127'kan' (Ziermann and Betlach, 1999) in place of the 
ery A ORPs. Both plasmids, pKAO^'kan' encoding ery DEBS and pKOS 108-06 
encoding meg DEBS, were introduced in Streptomyces lividans K4-1 14 and the 
production of 6-dEB was determined in shake-flask fermentations. The production 

1 5 profiles were similar in both cases and the maximum titer of 6-dEB was between 
30-40 mg/L. In addition, both PKSs produced small amounts (-5%) of 8,8a- 
deoxyoleandolide, which results from the priming of the PKS with acetate instead 
of propionate (Kao et al, 1994b). This observation indicates that the loading AT 
domains of the PKSs display similar relaxed specificities towards starter units. 

20 

Conversion of erythromycin to megalomicin in S. erythraea. 

An examination of the meg cluster revealed that the putative megosamine 
biosynthetic genes are clustered directly upstream of the PKS genes. If the 
hypothesis that these genes are sufficient for biosynthesis and attachment of 

25 megosamine to an erythromycin intermediate is correct, then functional expression 
of these genes in a strain which produces erythromycin, such as S. erythraea, 
should result in production of megalomicin. A 12 kb DNA fragment carrying all 
the genes between the leftmost Xhol site and the EcoRl site (Figure 9) was 
integrated in the chromosome of S. erythraea using the site-specific integrating 

30 vector pSET152 (Bierman et al. 9 1992). It was surmised that the left and right ends 
of this fragment would contain necessary promoter regions for transcription of the 
convergent set of genes in M megalomicea and that they would likely operate in 
S. erythraea. 



WO 01/27284 PCTYUS00/27433 

Fermentation broth from S. eryihraeaAZOS91-42, which contains the 
integrated meg genes, was analyzed by LC/MS and compared to LC/MS profiles 
of the parent S. erythraea strain without the meg genes, as well as to megalomicin 
standards purified from M megalomicea. The new strain was found to produce a 

5 mixture of erythromycin A and various megalomicins (-4: 1 ratio), thereby 

showing that the predicted megosamine biosynthetic and gtycosyltransferase genes 
are contained within the cloned meg fragment. The two most abundant congeners 
identified were megalomicins B and CI . Megalomicin A and C2 were also 
detected in smaller amounts. The presence of the megalomicins B, CI and C2 also 

10 provides direct evidence for the function of the 0-acyl transferase, MegY, which 
is present in the integrated meg fragment. 

Discussion 

The homologies observed among modular PKSs enabled the use of ery 
1 5 PKS genes to clone the meg biosynthetic gene cluster from M megalomicea. The 
close similarities between the megalomicin and erythromycin biosynthetic 
pathways is also reflected in the overall organization of their genes and in the high 
degree of homology of the corresponding individual gene-encoded polypeptides. 
Production of 6-dEB from meg DEBS in S. lividans and conversion of 
20 erythromycin to megalomicin using the rnegD genes in 5. erythraea provides 
direct evidence that the identified gene cluster is responsible for synthesis of 
megalomicin. 

As seen in Figure 9, the ~ 40 kb segments of the two clusters beginning 
with ery/megBV on the left through the ery/megF genes retain a nearly identical 

25 organizational arrangement. The notable differences in this region are eryG and 
IS/ 1 36 which are absent from the segment of the meg cluster analyzed. The eryG 
gene encodes an S-adenosylmethionine (SAM)-dependent mycarosyl 
methyltransferase that converts erythromycin C to erythromycin A (Figure 8) 
(Weber ei ai 9 1990; Haydock et ai, 1991). The mycarose moiety is modified by 

30 esterification (MegY) in megalomicin biosynthesis (Figure 8) and, therefore, the 
absence of an eryG homolog would be expected in the meg cluster. The IS/ 136 
element located between eryAI and eryAII (Donadio and Staver, 1993) is not 
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known to play a role in erythromycin biosynthesis and its origin in the ery cluster 
has not been determined. 

Upstream of the common meg/eryBIV and BV genes, the gene clusters 
diverge. The ~ 6 kb segment between eryBV and eryK, the left border of the ery 
5 gene cluster (Pereda et al., 1997), contains the remaining genes required for 

mycarose (eryBVI and BVII) and desosamine biosynthesis {eryCIV, CV, and CVI) 
and the C-12 hydroxylase (eryK) (Stassi et al., 1993). In contrast, the region 
upstream of megBV encodes a set of genes (megDI-DVII and megY) which can 
account for all the activities unique to megalomicin biosynthesis (Figure 9). Since 
1 0 introduction of this meg DNA segment into S. erythraea results in production of 
megalomicins, it is clear that these genes encode the functions for TDP- 
megosamine biosynthesis and transfer to its putative substrate erythromycin C, and 
to acylate megalomicin A (Figure 8). The remaining region upstream of megDVI 
should therefore encode genes only for mycarose and desosamine biosynthesis. 
1 5 Olano et al. (Olano et al, 1 999) have recently described a pathway for 

biosynthesis of TDP-L-daunosamine, a deoxysugar component of the antitumor 
compounds daunorubicin and doxorubicin produced by Streptomyces peucetius. 
Their pathway proposes four steps from the intermediate TDP-4-keto-6- 
deoxyglucose controlled by the gene cluster dnmJQTUVZ, although the functions 
20 for dnmQ and dnmZ could not be identified and the precise order of reactions in 
the pathway could not be determined. The genes dnmT, dnmU, dnmJ and dnmV 
each have proposed counterparts in the meg cluster, megT, megDIV, megDIJ, and 
megDV, respectively (see Figure 10) 

It is possible to describe a pathway to convert TDP-2,6-dideoxy-3,4- 
25 diketo-D-hexose (or its enol tautomer), the last intermediate common to the 

mycarose and megosamine pathways, to TDP-megosamine through the sequence 
of 5-epimerization, 4-ketoreduction, 3-amination, and 3-N-dimethylation 
employing the genes megDIV, megDV, megDII, and megDIIl. This employs the 
same functions proposed for biosynthesis of TDP-daunosamine by Olano et ai, 
30 but in a different sequential order. However, it does not account for the megDVI 
and megDVII genes since their activities are not required for this route. A parallel 
pathway which employs these genes is also shown in Figure 10. In this alternate 
route, 2,3-reduction and 3,4-tautomerization are performed by the megDVII and 
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megDVl gene products, respectively. A unified single pathway that employs both 
4-ketoreduction (megDV) and 2,3-reduction (megDVII) could not be determined. 
Because the entire gene set from megDVl through megDVII was introduced in S. 
erythraea to produce TDP-megosamine, it is not possible to determine which, if 
5 either, of the two alternative pathways is operative, but this can be addressed 
through systematic gene disruption and complementation. 

The 48 kb segment sequenced also contains genes required for synthesis of 
TDP-L-mycarose and TDP-D-desosamine (Fig 10). For the latter, megCII, which 
encodes a putative 3,4-isomerase, the first step in the committed TDP-desosamine 
10 pathway, appears to be translationally coupled to megAIII, almost exactly as its 
erythromycin counterpart, eryCII, was found translationally coupled to eryAIII 
(Summers et at., 1997). The high degree of similarity between MegCII and EryCII 
suggests that the pathway to desosamine in the megalomicin- and erythromycin- 
producing organisms are most likely the same. Similarly, the finding that megBH 
15 and megBIV, encoding a 2,3-reductase and 4-ketoreductase, contain close 

homologs in the mycarose pathway for erythromycin also suggests that TDP-L- 
mycarose synthesis in the two host organisms is the same. 

Of interest are the two genes that encode putative 2 ,3 -reductases, megBH 
and megDVII, Because MegBH most closely resembles EryBil, a known mycarose 
20 biosynthetic enzyme (Weber et al. 9 1990), and because megBH resides in the same 
location of the meg cluster as its counterpart in the ery cluster, megBH is assigned 
to the mycarose pathway and megDVII to the megosamine pathway. Furthermore, 
the lower degree of similarity between MegDVII and either EryBII or MegBH 
(Table 2) provides a basis for assigning the opposite L and D isomeric substrates 
25 to each of the enzymes (Figure 10). Finally, megT, which encodes a putative 2,3- 
dehydratase, is also related to a gene in the ery mycarose pathway, eryBVI. In S. 
erythraea, the proposed intermediate generated by EryBVI represents the first 
committed step in the biosynthesis of mycarose (Figure 10). However, the 
proposed pathways in Figure 10 suggest this may be an intermediate common to 
30 both mycarose and megosamine biosynthesis in M. megalomicea. Therefore, megT 
is named following the designation of the equivalent gene in the daunosamine 
pathway, dnmT (Olano et al y 1999) 
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The preferred host-vector system for expression of meg DEBS described 
here has been used previously for the heterologous expression of modular PKS 
genes from the erythromycin (Kao et al. 9 1994a; Ziermann and Betlach, 1999), 
picromycin (Tang et ai 3 1999) and oleandomycin pathways, as well as for the 
5 generation of novel polyketide backbones where domains have been removed, 
added or exchanged in various combinations (McDaniel et aL, 1999). Recently, 
hybrid polyketides have been generated through the co-expression of subunits 
from different PKS systems (Tang et al, 2000). 

Expression of the megDVl-megDVll segment in S. erythraea and the 
1 0 corresponding production of megalomicins in this host establishes the likely order 
of sugar attachment in megalomicin synthesis. Furthermore, it provides a means to 
produce megalomicin in a more genetically friendly host organism, leading to the 
creation of megalomicin analogs by manipulating the PKS. Over 60 6-dEB 
analogs have been produced by combinatorial biosynthesis using the ery PKS 
1 5 (McDaniel et al, 1 999; Xue et al , 1 999). The titers of megalomicin could also be 
significantly increased above the 5 mg/L obtained from M megalomiciea by 
introducing the genes into an industrially optimized strain of S. erythraea, many of 
which can produce as much as 10 g/L of erythromycin. 
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Example 2 

Stabilizing meg PKS Expression Plasmid by Codon Engineering 

30 Materials and methods 

All bacterial strains were cultured and transformed as described in 
Example 1 . 
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Fermentation of Streptomyces and diketide feeding 

Primary Streptomyces transformants were picked and placed in 6 mL of 
TSB liquid medium with 50 ng/L of thiostrepton and grown at 30°C When the 
5 culture showed some growth (3-4days), it was transferred into a 250 mL flask 

containing 50 mL of R6 medium (pH 7.0) with 25 ug/L of thiostrepton and ] g/L of 
diketide ((2s,3R)2-methyl-3-hydroxyhexanoate N-propionyl cysteamine thioester) 
and placed in a 30°C incubator for 7 days. 

1 0 Changing codons and making plasm ids 

There are several identical sequences in the coding sequences for module 2 
and module 6 of the megalomicin PKS gene cluster. Expression plasmids 
containing the full length megalomicin PKS appeared to be somewhat unstable 
and subject to deletion in recA strains like ET1 24567 and Streptomyces by intra- 

1 5 plasmid homologous recombination. To prevent significant homologous 

recombination and so stabilize expression plasmids, the codons of two regions of 
the module 6 coding sequence that are identical to regions in the module 2 coding 
sequence were changed without changing the sequence of protein encoded. The 
two regions changed in module 6 were from the 2673 9 lh base to 27,267 th base and 

20 from position 27,697 th base to 27,987 th base, which were identical to the region 
from position 6810 th base to 7338 th base and regions from position 7778 th base to 
8068 base, respectively. The start codon of the loading domain of the meg PKS 
was set to be the l sr base. These sequences are shown below 



25 > 6810-7338 Sequence in Module 2 

TTGCAGCGGTTGTCGGTGGCGGTGCGGGAGGGGCGTCGGGTGTTGGGTGTGGTGGTGGGT 
TCGGCGGTGAATCAGGATGGGGCGAGTAATGGGTTGGCGGCGCCGTCGGGGGTGGCGCAG 
CAGCGGGTGATTCGGCGGGCGTGGGGTCGTGCGGGTGTGTCGGGTGGGGATGTGGGTGTG 
GTGGAGGCGCATGGGACGGGGACGCGGTTGGGGGATCCGGTGGAGTTGGGGGCGTTGTTG 
30 GGGACGTATGGGGTGGGTCGGGGTGGGGTGGGTCCGGTGGTGGTGGGTTCGGTGAAGGCG 
AATGTGGGTCATGTGCAGGCGGCGGCGGGTGTGGTGGGTGTGATCAAGGTGGTGTTGGGG 
TTGGGTCGGGGGTTGGTGGGTCCGATGGTGTGTCGGGGTGGGTTGTCGGGGTTGGTGGAT 
TGGTCGTCGGGTGGGTTGGTGGTGGCGGATGGGGTGCGGGGGTGGCCGGTGGGTGTGGAT 
GGGGTGCGTCGGGGTGGGGTGTCGGCGTTTGGGGTGTCGGGGACGAAT (SEQ 10 NO: 23) 
5 > 26736-27267 Sequence in Module 6 

CTGCAGCGGTTGTCGGTGGCGGTGCGGGAGGGGCGTCGGGTGTTGGGTGTGGTGGTGGGT 
TCGGCGGTGAATCAGGATGGGGCGAGTAATGGGTTGGCGGCGCCGTCGGGGGTGGCGCAG 
CAGCGGGTGATTCGGCGGGCGTGGGGTCGTGCGGGTGTGTCGGGTGGGGATGTGGGTGTG 
GTGGAGGCGCATGGGACGGGGACGCGGTTGGGGGATCCGGTGGAGTTGGGGGCGTTGTTG 
40 GGGACGTATGGGGTGGGTCGGGGTGGGGTGGGTCCGGTGGTGGTGGGTTCGGTGAAGGCG 
AATGTGGGTCATGTGCAGGCGGCGGCGGGTGTGGTGGGTGTGATCAAGGTGGTGTTGGGG 
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TTGGGTCGGGGGTTGGTGGGTCCGATGGTGTGTCGGGGTGGGTTGTCGGGGTTGGTGGAT 
TGGTCGTCGGGTGGGTTGGTGGTGGCGGATGGGGTGCGGGGGTGGCCGGTGGGTGTGGAT 
GGGGTGCGTCGGGGTGGGGTGTCGGCGTTTGGGGTGTCGGGGACGAAT (SEQ ID NO: 24) 

> 26736-27267 Sequence with Codon Changes 
5 CTGCAGCGCCTCTCCGTCGCCGTCCGCGAGGGCCGCCGAGTCCTCGGCGTCGTCGTCGGC 
TCGGCCGTCAACCAAGACGGCGCGTCAAACGGCCTCGCCGCGCCCTCCGGCGT.CGCCCAG 
CAGCGCGTCATACGCCGCGCGTGGGGACGCGCCGGAGTATCGGGCGGCGACGTCGGAGTC 
GTCGAGGCCCACGGCACCGGCACCCGCCTCGGGGATCCCGTCGAGCTGGGCGCCCTCCTG 
GGCACGTACGGCGTCGGCCGCGGCGGCGTCGGCCCGGTCGTCGTCGGCAGCGTCAAGGCC 
10 AACGTCGGCCACGTCCAGGCCGCGGCCGGCGTCGTCGGGGTCATCAAGGTCGTCCTCGGC 
CTCGGCCGCGGGCTGGTCGGCCCGATGGTCTGCCGCGGCGGCCTCAGCGGCCTCGTCGAC 
TGGTCGTCCGGCGGCCTGGTCGTCGCGGACGGGGTCCGCGGCTGGCCGGTCGGCGTCGAC 
GGCGTCCGCCGGGGCGGCGTCTCGGCGTTCGGCGTCAGCGGGACGAAT (SEQ ID NO: 25) 



15 > 6978-7337 Sequence in Module 2 

GGTGGAGTGTGATGCGGTGGTGTCGTCGGTGGTGGGGTTTTCGGTGTTGGGGGTGTTGGA 

GGGTCGGTCGGGTGCGCCGTCGTTGGATCGGGTGGATGTGGTGCAGCCGGTGTTGTTCGT 

GGTGATGGTGTCGTTGGCGCGGTTGTGGCGGTGGTGTGGGGTTGTGCCTGCGGCGGTGGT 

GGGTCATTCGCAGGGGGAGATCGCGGCGGCGGTGGTGGCGGGGGTGTTGTCGGTGGGTGA 

20 TGGTGCGCGGGTGGTGGCGTTGCGGGCGCGGGCGTTGCGGGCGTTGGCCGG (SEQ ID NO: 

26) 

> 27697-27987 Sequence in Module 6 

GGTGGAGTGTGATGCGGTGGTGTCGTCGGTGGTGGGGTTTTCGGTGTTGGGGGTGTTGGA 
GGGTCGGTCGGGTGCGCCGTCGTTGGATCGGGTGGATGTGGTGCAGCCGGTGTTGTTCGT 
25 GGTGATGGTGTCGTTGGCGCGGTTGTGGCGGTGGTGTGGGGTTGTGCCTGCGGCGGTGGT 
GGGTCATTCGCAGGGGGAGATCGCGGCGGCGGTGGTGGCGGGGGTGTTGTCGGTGGGTGA 
TGGTGCGCGGGTGGTGGCGTTGCGGGCGCGGGCGTTGCGGGCGTTGGCCGG (SEQ ID NO: 
27) 

> 27697-27987 Sequence with Codon Changes 

30 CGTGGAGTGCGATGCGGTCGTGTCGAGCGTCGTCGGCTTCAGCGTGCTGGGCGTCCTGGA 
GGGCCGCAGCGGCGCCCCGAGCCTGGACCGCGTCGACGTGGTCCAGCCGGTCCTGTTCGT 
GGTCATGGTCAGCCTGGCCCGCCTGTGGCGCTGGTGCGGCGTGGTCCCGGCCGCCGTGGT 
CGGCCACAGCCAGGGCGAGATCGCCGCCGCGGTCGTGGCCGGCGTCCTGAGCGTCGGCGA 
CGGCGCCCGCGTCGTGGCCCTGCGCGCCCGCGCCCTGCGCGCCCTGGCCGG (SEQ ID NO: 

35 28) 



Three pieces of DNA from the two regions above were synthesized and verified by 
Retrogen, and the synthesized DNAs were cloned into pCR-Blunt II -TOPO, as 
shown in the Table 3 below. 

40 



Table 3. Plasmids containing synthesized DNA 



Plasmids 


Cloning sites and positions in meg PICS 


pKOS97-1613 


Pstl-BamHI, 26,739 ,h -26,947 th base 


PKOS97-1622 


BamHI-BsmI, 26,947 lh -27,267 th base 


PKOS97-1628 


SfaNI-Fsel, 27,697 th - 27,987 th base 



Assembly of (he expression plasmid 

First, ligation of the Pstl-BamHI fragment of pKOS97-1613, the BamHI- 
45 Bsml fragment of pKOS97-1622 and Bsml-PstI linearized pKOS97-90 produced 
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pKOS97-151. Then, the insertion of the SfaNl-Fsel fragment of pKOS97-1628 
into pfCOS97-151 gave rise to pKS097-152. Then, the Pstl-BlpI fragment of 
pKOS97-125 was used to replace the Pstl-BlpI fragment of pKOS97-90a and 
. produced pKOS97- 1 60. 
5 The final expression plasmid (in pRM5) pKOS97-162 was the result of 

Bglll-Nhel fragment of pKOS97-160 inserted into Bglll-Nhel sites of pKOS108- 
04. 

Another expression plasmid pKOS97-152a was made by a four-fragment 
ligation. The four fragments were a Blpl-Xbal fragment (containing a cos site) of 
10 pKOS97-92a, a Bglll-Pstl fragment of pKOS97-81, a Pstl-BlpI fragment of 
pKOS97-152, and a Bglll-Xbal fragment of pKOS 108-04 (as the vector). 

Tests of the constructed plasmids showed that the" plasmids containing the 
modified coding sequences were more stable than plasmids containing unmodified 
coding sequence. 

15 

Example 3 
Construction of Ole-Meg Hybrid PKS 
Construction of pRMl -based pKOS098-48 for the expression of OlePKS modules 
1-4. 

20 The 240-bp fragment containing the 3 '-end portion of oleAII gene (at nt 

11210-1 1452; the first base of the start codon of oleAII \s nt 1) was PCR amplified 
with primers N98-38-1 (5'GAACAACTCCTGTCTGCGGCCGCG-3') (SEQ ID 
NO:29)andN98-38-3 (5 , - 

CGGAATTCTCTAGAGTCACGTCTCCAACCGCTTGTCGAGG-3') (SEQ ID 
25 NO: 30). The fragment contains a naturally occurring NotI site at its 5'-end and 
the engineered Xbal (bold) and EcoRJ sites (underline) at its 3'-end following the 
oleAII stop codon. pKOS38-189 was digested with EcoRJ and NotI to give five 
fragments of 8 kb, 5 kb, 4 kb 5 2.5 kb and 2 kb. The 8-kb EcoRJ-NotI fragment 
containing oleAII gene nt 2961 to nt 11210 and the 240-bp NotI, EcoRJ treated 
30 PCR fragment were ligated into litmus 28 at the EcoRJ site via a three-fragment 
ligation to give pKOS98-46. The 8.2-kb EoRI fragment from pKOS98-46 was 
cloned into pKOS38-l 74, a pRM I derived plasmid containing oleAI and nt 1 to nt 
2960 of oleAII to give pKOS98-48. 
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Construction ofpSET152-basedpKOS98-60 for the expression ofmegPKS 
modules 5-6. 

The 360-bp fragment containing nt 1 to nt 366 of megAJII was PCR 

5 amplified with primers N98-40-3 (5'- 

TCTAGAC TTAATTAA GGAGGACAC47^rGAGCGA-GAGCAGC- 
GGCATGACCG-3 ') (SEQ ID NO: 31) and N98-40-2 (5'- AACGCCTCCCAG- 
G AG ATCTCC AGC A-3 ') (SEQ ID NO: 32). A Pad site and a Ndel site as well 
as the ribosome binding site were introduced at the 5'-end of the megAI start 

10 codon. The 360-bp PacI-BgM fragment was inserted into pKOS 108-06 replacing 
the 22-kb Pacl-Bglll fragment to yield pKOS98-55. The 10-kb Pacl-Xbal 
fragment containing megAIII gene and the annealed oligos N98-23-1 (5'- 
A ATTC AT AGCCT AGGT-3 ' ) (SEQ ID NO: 33) and N98-23-2 (5*- 
CTAGACCTAGGCTATG-3') (SEQ ID NO: 34) were ligated to Pad and EcoRI 

1 5 treated pSETl 52 derivative pKOS98-l 4 via a three- fragment ligation to give 
pKOS98-60. 

Exam ple 4 

Conversion of Erythronolides to Erythromycins 
20 A sample of a polyketide (-50 to 100 mg) is dissolved in 0.6 mL of 

ethanol and diluted to 3 mL with sterile water. This solution is used to overlay a 
three day old culture of Saccharopolyspora erythraea WHM34 (an eryA mutant) 
grown on a 100 mm R2YE agar plate at 30°C. After drying, the plate is incubated 
at 30°C for four days. The agar is chopped and then extracted three times with 100 
25 mL portions of 1% triethylamine in ethyl acetate. The extracts are combined and 
evaporated. The crude product is purified by preparative HPLC (C-18 reversed 
phase, water-acetonitrile gradient containing 1% acetic acid). Fractions are 
analyzed by mass spectrometry, and those containing pure compound are pooled, 
neutralized with triethylamine, and evaporated to a syrup. The syrup is dissolved 
30 in water and extracted three times with equal volumes of ethyl acetate. The 

organic extracts are combined, washed once with saturated aqueous NaHCO^, 
dried over Na 2 S0 4 , filtered, and evaporated to yield -0.1 5 mg of product. The 
product is a glycosylated and hydroxylated compound corresponding to 

101 



SNSPOCtD: <WO, 0127284A2_1_> 



WO 01/27284 PCT/US0O/27433 

erythromycin A, B, C, and D but differing therefrom as the compound provided 
differed from 6-dEB. 

Example 5 

5 Measurement of Antibacterial Activity 

Antibacterial activity is determined using either disk diffusion assays with 
Bacillus cereus as the test organism or by measurement of minimum inhibitory 
concentrations (MIC) in liquid culture against sensitive and resistant strains of 
Staphylococcus pneumoniae. 

10 

Example 6 
Evaluation of Antiparasitic Activity 
Compounds can initially screened in vitro using cultures of P. falciparum 
FCR-3 and Kl strains, then in vivo using mice infected with P. berghei. Mammalian 
15 cell toxicity can be determined in FM3A or KB cells. Compounds can also be 

screened for activity against P. berhei. Compounds are also tested in animal studies 
and clinical trials to test the antiparasitic activity broadly (antimalarial, 
trypanosomiasis and Leishmaniasis). 



The invention having now been described by way of written description 
and example, those of skill in the art will recognize that the invention can be 
practiced in a variety of embodiments and that the foregoing description and 
examples are for purposes of illustration and not limitation of the following 
claims. 
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Claims 

1 . An isolated nucleic acid comprising a nucleotide sequence 
encoding a domain of megalomicin polyketide synthase (PKS) or a megalomicin 
modification enzyme. 

5 

2. The isolated nucleic acid of claim 1, which encodes a PKS open 
reading frame (ORF) selected from the group consisting of megAI, megAII and 
megAIH. 

10 3. The isolated nucleic acid of claim 1 , wherein the PKS domain is 

selected from the group consisting of a TE domain, a KS domain, an AT domain, 
an ACP domain, ia KR domain, a DH domain, and an ER domain. 

4. The isolated nucleic acid of claim I , wherein the nucleic acid 

15 comprises the coding sequence for a loading module, a thioesterase domain, and 
all six extender modules of megalomicin PKS. 

5. The isolated nucleic acid of claim I , which encodes a megalomicin 
modification enzyme that is involved in the conversion of 6-dEB into a 

20 megalomicin. 

6. The isolated nucleic acid of claim 5, which encodes a megalomicin 
modification enzyme that is involved in the biosynthesis of mycarose, 
megosamine or desosamine. 

25 

7. The isolated nucleic acid of claim 1 , wherein the nucleic acid 
codons of homologous regions within the PKS or the megalomicin modification 
enzyme coding sequence have been changed to reduce or abolish the homology 
without changing the amino acid sequences encoded by said changed nucleic acid 

30 codons. 
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8. The isolated nucleic acid of claim 1, which isolated nucleic acid 
fragment hybridizes to a nucleic acid having a nucleotide sequence set forth in the 
SEQ. ID NO:l. 

5 9. A polypeptide, which is encoded by the isolated nucleic acid 

fragment of claim 1 . 

1 0. A recombinant DNA expression vector, comprising the isolated 
nucleic acid of claim 1 operably linked to a promoter. 

10 

11. A recombinant host cell, comprising the recombinant DNA 
expression vector of claim 10. 

12. The recombinant host cell of claim 1 1 , which is a Streptomyces or 
1 5 Saccharopolyspora host cell. 

13. A recombinant host cell of claim 1 1 , which comprises: 

a) at least two separate autonomously replicating recombinant DNA 
expression vectors, each of said vectors comprises a recombinant DNA compound 
encoding a megalomicin PICS domain or a megalomicin modification enzyme 
operably linked to a promoter; or 

b) at least one autonomously replicating recombinant DNA expression 
vector and at least one modified chromosome, each of said vector(s) and each of 
said modified chromosome comprises a recombinant DNA compound encoding a 
megalomicin PKS domain or a megalomicin modification enzyme operably linked 
to a promoter. 

1 4. A hybrid PKS that comprises a polypeptide of claim 9 and is 
composed of at least a portion of a megalomicin PKS and at least a portion of a 

30 second PKS for a polyketide other than megalomicin. 



20 
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1 5. The hybrid PKS of claim 14, wherein the second PICS is selected 
from the group consisting of a narbonolide PKS, an oleandolide PKS, and a DEBS 
PKS. 

5 16. The hybrid PKS of claim 1 5 that is composed of the megAI and 

megAll gene products and the oleAIII gene product. 

17. The hybrid PKS of claim 16, wherein the KS domain of module 1 
of the megAI gene product has been inactivated by mutation. 

10 

18. A method of producing a polyketide, which method comprises 
growing the recombinant host cell of claim 1 1 under conditions whereby the 
megalomicin PKS domain encoded by the recombinant expression vector is 
produced and the polyketide is synthesized by the cell, and recovering the 

1 5 synthesized polyketide. * 

19. A recombinant host cell that comprises a recombinant expression 
vector that encodes a megalomicin modification enzyme. 

20 20. The recombinant host cell of claim 1 9 that produces megosamine 

and can attach megosamine to a polyketide, wherein said host cell, in its naturally 
occurring non-recombinant state cannot produce megosamine. 
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Erythromycin A 



Structures of the Megalomicins and Azithromycin 

Figure 3 
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module 1 



module 2 



module 3 



module 4 



module 5 



release 



module 6 




AT = acyttransferase 
ACP = acyl carrier protein 
KS = ketosynthase 
KR = ketoreductase 
DH = dehydratase 
ER = enoyl reductase 
TE = thioesterase 




6-deoxyerythronolide B 



Biosynthesis of 6-Deoxyerythronolide B (6-dEB), the Aglycone of Erythromycin, by a 

Modular PKS 

Figure 4 
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LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 



JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



gene 
CDS 



gene 
CDS 



gene 
CDS 



gene 



1 47981 bp DNA 01-MAY-2000 

Megalomicin biosynthetic gene cluster, polyketide synthase, 
desdsamine, megosamine, and mycarose biosynthesis genes, 
1 



Microtnonospora megalomicea. 
Micromonospora megalomicea 
Unclassified. 

1 (bases 1 to 47981) 

Volchegursky, Y. , Hu,Z., Katz,L. and McDaniel , R. 

Biosynthesis of the Ant i -Parasitic Agent Megalomicin; 

Transformation of Erythromycin to Megalomicin in Saccharopolyspora 

erythraea 

Unpublished 

2 (bases 1 to 47981) 
McDaniel, R. and Volchegursky, Y. 
Direct Submission 

Submitted (01 -MAY-2000) Kosan Biosciences, Inc., 3828 Bay Center 
Place, Hayward, CA 94545, USA 

Location/Qualifiers 

1 . .47981 

/organism= "Micromonospora megalomicea" 

/strain="NRRL3275" 

/sub__species- M nigra" 

complement (<1 .. 144 ) 

/gene="megT" 

complement (<1 144 ) 

/gene="megT" 

/codon_ start=l 

/ transl__table=ll 

/product="TDP-4 -keto-6 -deoxyglucose -2 , 3 -dehydratase" 

f translation^ " MGDRVNGHATPESTQS AIRFLTRHGGPPTATDDVHD WLAHRAAE 

[RLE" (SEQ ID NO: 2) 

>28 . . 2061 
/gene="megDVl " 
928 . .2061 
/ gene=" megDVI " 
/codon__start=l 
/transl_table=ll 

/product= "TDP-4 -keto-6-deoxyhexose 3 , 4 -isomerase" 

/ 1 r ans 1 a t ion= " MAVGDRRRLGRELQMARGLYWGFGANGDLYSMLLSGRDDDPWTW 

YERLRAAGRGPYASRAGTWWGDHRTAAEVLADPGFTHGPPDAARWMQVAHCPAASWA 

GPFREFYARTEDAASVTVDADWLQQRCARLVTELGSRFDLVNDFAREVPVLALGTAPA 

LKGVDPDRLRSWTSATRVCLDAQVSPQQLAVTEQALTALDEIDAVTGGRDAAVLVGW 

AELAANTVGNAVTjAVTELPELAARLADDPETATRVVTEVSRTSPGVHLERRTAASDRR 

VGGTOVPTGGEVTVWAAANRDPEVFTDPDRFDVDRGGDAEILSSRPGSPRTDLDALV 

ATLATAALRAAAPVLPRLSRSGPVIRRRRSPVARGLSRCPVEL" (SEQ ID NO: 3) 

2072 . .3382 

/gene=" megDI " 

2072 . .3382 

/gene=" megDI" 

/ codon_start=l 

/transl_table=ll 

/product = lf TDP -megosamine glycosyltransf erase" 

/translati on = " MR WFS S MAVNSHLFGLVPL AS AFQAAGHE VRWAS PALTDD VT 
GAGLTAVPVGDDVELVEWHAHAGQDIVEYMRTLDWVDQSHTTMSWDDLLGMQTTFTPT 
FFALMSPDSLIDGMVEFCRSWRPDWIVWEPLTFAAPIAARVTGTPHARMLWGPDVATR 
ARQSFLRLLAHQEVEHREDPLAEWFDWTLRRFGDDPHLSFDEELVLGQWTVDPIPEPL 
RIDTGVRTVGMRYVPYNGPSWPAWLLREPERRRVCLTLGGSSREHGIGQVSIGEMLD 
A I AD I DAE F VATFDDQQL VG VG S VPANVRTAG F VPMNVLL PTCAATVHHGGTG S WLTA 
AI HG VPQ 1 1 LS D ADTE VHAKQLQDLG AGLS L P VAGMTAEHLRG AI E RVLDE PA YRLGA 
ERMRDGMRTDPSPAQWGICQDLAADRAARGRQPRRTAEPHLPR" (SEQ ID NO: A) 
3462 . .4634 
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/gene="megY" 
3462 . .4634 
/gene= ,, megY u 
/codon_start=l 
/transl_table=ll 

/product= H mycarose O-acyltransf erase" 

/translation^ "MVTSTNLDTTAJ^PALNSLTGMRFVAAFLVFFTHVLSRLIPNSYV 

Y ADGLD AFWQTTGRVGVS FFF I LSG F VLTWS ARASDS VWS F WRRRVCKLFPNHIjVTAF 

AAWLFLVTGQAVSGEAIilPNLLrLlHAWFPALEISFGINPVSWSLACEAFFYLCFPLF 

LFWISGIRPERLWAWAJVWFAAIWAVPWADLLLPSSPPLIPGLEYSAIQDWFLYTFP 

ATRSLEFILGIILARILITGRWINVGLLPAVLLFPVFFVASIiFLPGVYAISSSMMILP 

LVLI IASGATADLQQKRTFMRNRV>P/WLGDVSFAIjYMVHFIjVIVYGADLLGFSQTEDA 

PLGLALFMIIPFIAVSLVLSWLLYRFVELPVMRNWARPASARRKPATEPEQTPSRR" 

4651.. 5775 (SEQ ID NO: 5) 

/gene= n megDII" 

4651 . . 5775 

/gene^'megOU" 

/ codon^s t a r t = 1 

/ trans l_table=ll 

/product="TDP-3 -keto-6-deoxyhexose 3 -amino transaminase " 
/ trans 1 at ion= M MTTYVWS YLLEYERERAD I LDAVQKVFASGS LI LGQS VENFETE 
YARYHGIAHCVGVDNGTNAVKLALESVGVGRDDEVVTVSNTAAPTVLAIDEIGARPVF 
VDWDEDYL^TDLVEAAVTPRTKAIVP\mLYGQCVDMTAIjRELADRRGLKLVEDCAQ 
AHGARRDGRLAGTMSDAAAFSFYPTKVLGAYGDGGAVVTNDDETARALRRLRYYGMEE 
VY YVTRTPGHNS RLDEVQAE I LRRKLTRLDAYVAGRRAVAQRYVDGLADLQDSHGLEL 
P WTDGNEHVF YVYWRHPRRDE 1 1 KRLRDGYD ISLN I S YPWPVHTMTG FAHLGVASG 
SLPVTERLAGEIFSLPMYPSLPKDLQDRVTEAVREVITGL 1 ' ( SE Q 10 N0: 6 > 

5822 . . 6595 
/gene= "megDI II " 
5822 . . 6595 
/gene-"megDIII " 
/codon_start=l 
/ trans l_table=ll 

/product="daunosaminyl-N, N-dimethyl transferase" 

/trans la t ion= " MPNSHSTTSSTDVAPYERADI YHDF YHGRGKGYRAEADALVEVA 

RKHTPQAATLLDVACGTGSHLVELADSFREVVGVDLSAAhlLATAARNDPGRELHQGDM 

RDFSLDRRFDWTCMFSSTGYIiVDEAELDRAVANLAGHLAPGGTLWEPWWFPETFRP 

GWVGADLVTSGDRRISRMSHTVPAGLPDRTASRMTIHYTVGSPEAGIEHFTEVHVMTL 

FARAAYEQAFQRAGLSCSYVGHDLFSPGLFVGVAAEPGR" (SEQ ID NO: 7) 

6592 . .7197 

/gene= ,, megDIV" 

6592 . .7197 

/gene= M megDIV" 

/codon_start=l 

/transl_table=ll 

/product = *'TDP-4 -keto-6-deoxyhexose 3 , 5 -epirrierase " 
/translat ion= " MRVEELG I EGVFTFT PQTFADERGVFGTAYQEDVF VAALGRPLF 
PVAQVSTTRSRRGVVRGraFTTMPGSMAKYVYCARGRAMDFAVDIRPGSPTFGRAEPV 
ELSAESMVGLYLPVGMGHLFVSLEDDTTLVYLMSAGYVPDKERAVHPLDPEIiALPIPA 
DLDLVMSERDRVAPTLREARDQGILPDYAACRAAAHRWRT" (SEQ ID NO: 8) 

7220 . .8206 
/gene= "megDV" 
7220 . . 8206 
/gene= "megDV" 
/codon_start=l 
/trans l_table=ll 

/product="TDP-4 -keto-6-deoxyhexose 4 -ketoreductase" 

/trans la tion="MWLGASGFLGSAVTHALADLPVRVRLVARREVWPSGAVADYE 

THRVDLTE PG ALAE WADARAVFPFAAQ I RGTSG WR I S EDD WAERTWVGLVRDL I AV 

LSRSPHAPVWFPGSNTQVGRVTAGRVIDGSEQDHPEGVYDRQKHTGEQLLKEATAAG 

AlRATSLRLPPVFGVPAAGTADDRGWSTMIRRAIiTGQPLTMWHDGTVRRELLYVTDA 

ARAFVTALDHADALAGRHFLLGTGRSWPLGEVFQAVSRSVARHTGEDPVPWSVPPPA 

HMDPSDLRSVEVDPARFTAVTGWRATVT^1AEAVDRTVAALAPRRAAAPSEPS ,, 

complement (8228. .9220) (SEQ ID NO: -9) 
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/gene= " megDVl I " 
complement (8228 .. 9220) 
/ gene = "megDVl I " 
/codon_start=l 
/ transl_table=ll 

/product="TDP-4-keto-6-deoxyhexose 2 , 3 -reductase" 

/translation="MGTTGAGSAKVRVGRSALHTSRLWLGTVNFSGRVTDDDALRLMD 
HALERGVNC I DTAD I YGWRLYKGHTEELVGRWFAQGGGRREETVLATKVGS EMS ERVN 

DGGLSARHIVAACENSLRRLGVDHIDIYQTHHIDRAAPWDEVWQAAEHLVGSGKVGYV 

GSSNLAGWHIAAAQESAARRNLLGMISHQCLYNLAVRHPELDVLPAAQAYGVGVFAWS 

PLHGGLLSGVLEKIAAGTAVKSAQGRAQVIiLPAVRPLVEAYEDYCRRLGADPAEVGLA 

WVLSRPGILGAVIGPRTPEQLDSAIiRAAELTLGEEELRELEAIFPAPAVDGPVP" 

complement (9226 10479) (SEQ ID NO: 10) 

/gene="megBV" 

complement (9226 . . 10479) 

/gene=" megBV" 

/codon — start=l 

/transl_table=ll 

/product="TDP-mycarose glycosyltransferase" 

/ trans la tion= "MRVLLTSFAHRTHFQGLVPIiAWAXiHTAGHDVRVASQPELTDVVV 

GAGLTSVPLGSDHRLFDISPEAAAQVHRYTTDLDFARRGPELRSWEFLHGIEEATSRF 

VFPWNNDSFVDELVEFAMDWRPDLVLWEPFTFAGAVAAKACGAAHARLLWGSDLTGY 

FRSRSQDLRGQRPADDRPDPLGGWLTEVAGRFGLDYSEDIiAVGQWSVDQLPESFRLET 

GLESVHTRTLPYNGSSWPQWLRTSDGVRRVCFTGGYSAliGITSNPQEFLRTLATLAR 

FDGEIWTRSGLDPASVPDNVRLVDFVPMNILLPGCAAVIHHGGAGSWATALHHGVPQ 

ISVAHEWDCVLRGQRTAELGAGVFLRPDEVDADTLWQALATVVEDRSHAENAEICLRQE 
ALAAPTPAEVVPVLEALAHQHRADR " (SEQ ID NO: 11) 

complement (104 83 . .11424) 

/gene= ,, megBIV"* 

complement (1048 3 . .114 24) 

/gene="megBIV" 

/codon_start=l 

/trans l_t able =11 

/ product = "TDP - 4 -keto-6 -deoxyhexose 4 -ketoreductase" 

/ 1 rans 1 a t i on= " MTRHVTLLGVSGFVGS ALLREFTTHPLRLRAVARTGS RDQPPGS 

AG I EHLR VDLLE PGR VAQ VVADTD VVVHLVAY AAGGS T WRS AATV P E AE RVNAG I MRD 

LVAALRARPGPAPVXrLFASTTQAANPAAPSRYAQHKIEAERILRQATEDGWDGVILR 

LPAIYGHSGPSGQTGRGWTAMIRRAIiAGEPITMWHEGSVRRNLLHVEDVATAFTAAL 

HNHEALVGDWTPSADEARPLGEIFETVAASVARQTGNPAVPWSVPPPENAEANDFR 

SDDFDSTEFRTLTGWHPRVPLAEGIDRTVAAIilSTKE" (SEQ ID NO: 12) 

12181. .22821 

/gene="megAI " 

12181. .22821 

/gene= n megAI *' 

/note="polyketide synthase" 

/codon_start=l 

/ trans l_table= 11 

/product= l? megalomicin 6 -deoxyerythronol ide B synthase 1" 

/translation="MVDVPDLLGTRTPHPGPLPFPWPLCGHNEPELRARARQLHAYIjE 

G I SEDDWAVGAALARETRAQDG PHRAVWASS VTELTAALAALAQGRPHPSWRGVA 

RPTAPWFVIiPGQGAQWPG^TRLLAESPVFAAAMRACERAFDEVTDWSLTEVLDSPE 

HLRRVE WQPALFAVQTS LAALWRS FG VRPDAVLGHS IGELAAAE VCGAVDVE AAARA 

AALWSREMVPLVGRGD^AAVALSPAELAARVERWDDDWPAGVNGPRSVLIjTGAPEPI 

ARRVAELAAQGVTIAQVVNVSMAAHSAQVDAVAEGMR 

RLDTRELGADHWPRSFRLPVRFDEATRAVLELQPGTFIESSPHPVLlAASLQQTLDEVG 
SPAAIVPTLQRDQGGLRRFLLAVAQAYTGGVTVDWTAAYPGVTPGHLPSAVAVETDEG 
PSTEFDWAAPDHVLRARLLEIVGAETAALAGREVDARATFRELGLDSVLiAVQLRTRLA 
TATGRDLHIAMLYDHPTPHALTEALLRGPQEEPGRGEETAHPTEAEPDEPVAWAMAC 
RLPGGVTSPEEFWELLAEGRDAVGGLPTDRGWDLDSLFHPDPTRSGTAHQRAGGFLTG 
ATSFDAAFFGLSPREALAVEPQQRITLELSWEVLERAGIPPTSLRTSRTGVFVGLIPQ 
EYGPRLAEGGEGVEGYLMTGTTTSVASGRVAYTLGLEGPAISVDTACSSSLVAVHLAC 
QSLRRGESTr4ALAGG\n?VWPTPGMLVDFSRhmSLAPDGRSKAFSAAADGFGMAEGAGM 
LLLERLSDARRHGHPVXJVVIRGTAVNSDGASNGLSAPNGRAQVRVIRQALAESGIiTPH 
TVDWETHGTGTRLGDPIEARALSDAYGGDREHPLRIGSVKSMIGHTQAAAGVAGLIK 
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misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



LVLAMQAGVLPRTLHAJDEPS PE I DWS SG AI S LLQE P AAW P AGERPRRAG VS S FG I SGT 

NAHAIIEEAPPTGDDTRPDRMGPWPWVLSASTGEALRARAARLAGHLREHPDQDLDD 

VAYSLATGRAAIiAYRSGFVPADASTALRILDELAAGGSGDAVTGTARAPQRVVFVFPG 

QGWQWAGMAVDLLDGDPVFASVLRECADALEPYIiDFEIVPFLRAEAQRRTPDHTLSTD 

RVDWQPVLFAVMVS LAARWRAYG VE P AAV IGHSQGEX AAAC VAGAL S LDDAARAVAL 

RSRVIATMPGNGAMASIAASVDEVAARIDGRVEIAAVNGPRAVWSGDRDDLDRLVAS 

CTVEGVRAKRLPVDYASHSSHVEAVT^DALHAELGEFRPLPGFVPFYSTVTGRWVEPAE 

LDAGYWFRNLRHRVRFADAVRSLADQGYTTFLEVSAHPVLTTAIEEIGEDRGGDLVAV 

HSLRRGAGGPVDFGSALARAFVAGVAVDVJESAYQGAGARRVPLPTYPFQRERFWLEPN 

PARRVADSDDVSSLRYRIEWHPTDPGEPGRLDGTWLLATYPGRADDRVEAARQALESA 

GARVEDLWEPRTGRVDLVRRLDAVGPVAGVLCLFAVAEPAAEHSPLAVTSLSDTLDL 

TQAVAGSGRECPIWWTENAVAVGPFERLRDPAHGALV7ALGRWALENPAVWGGLVDV 

PSGSVAELSRHIiGTTLSGAGEDQVALRPDGTYARRWCRAGAGGTGRWQPRGTVLVTGG 

TGG VGRHVAR W L ARQGT PCLVL AS RRG PD ADG VEE LLTE L ADLGTRATVT ACD VTDRE ■ 

QLRAIiLATVDDEHPLSAVFHVAATLDDGTVETLTGDRIERANRAKVLGARNLHELTRD 

ADLDAFVLFSSSTAAFGAPGLGGYVPGNAYLDGLAQQRRSEGLPATSVAWGTWAGSGM 

AEGPVADRFRRHGVMEMHPDQAVEGLRVALVQGEVAPIWDIRWDRFLLAYTAQRPTR 

LFDTLDEARRAAPGPDAGPGVAAIiAGLPVGEREKAVLDLVRTHAAAVLGHASAEQVPV 

DRAFAELGVDSLSALELRNRLTTATGVRLATTTVFDHPDVRTLAGHLAAELGGGSGRE 

RPGGEAPTVAPTDEP I AI VGMACRLPGGVDS PEQLWEL I VSGRDTAS AAPGDRS WDPA 

ELMVSDTTGTRTAFGNFMPGAGEFDAAFFGISPREALAMDPQQRHALETTWEAIjENAG 

irpeslrgtdtgvfvgmshqgyatgrpkpedevdgylltgntasvasgriayvlgleg 

paitvdtacssslvalhvaagslrsgdcglavaggvsvmagpevfrefsrqgalapdg 

rckpfsdeadgfglgegsafwlqrlsvavregrrvlgvwgsavnqdgasnglaaps 

gvaqqrvirrawgragvsggdvgweahgtgtrlgdpvelgallgtygvgrggvgpw 

vgsvkanvghvqaaagwgvikvvlglgrglvgpmvcrgglsglvdwssgglvvadgv 

rgwpvgvdgvrrggvsafgvsgtnahvvvaeapgsvvgaerpvegssrglvgvvggvv 

pvvlsaktetalhaqarrladhlethpdvpmtdvvwtltqarqrfdrravllaadrtq 

averlrglaggepgtgwsgvasgggwfvfpgqggqwvgmargllsvpvfveswec 

dawsswgfsvlgvlegrsgapsldrvdwqpvlfvvmvslarlwrwcgvvpaawg 

hsqgeiaaavvagvlsvgdgarvvaijraraijralaghggmasvrrgrddvqkijiidsgp 

wtgkiieiaavngpdavwsgdpravtelvehcdgigvrartipvdyashsaqveslre 

ells^agiegrpatvpfystltggfvdgteldadywyrnlrhpvrfhaavealaard 

lttfvevsphpvtjsmavgetladvesavtvgtlerdtddverfltslaeahvhgvpvd 

waavlgsgtlvdlptypfqgrrfwlhpdrgprddvadwfhrvdwtatatdgsarldgr 

WLVWPEGYTDDGWWEVRAALAAGGA^PWTTVEEVTDRVGDSDAWSMLGLADDGA 
AETLALLRRLDAQASTTPLWWTVGAVAPAGPVQRPEQATWJGLALVASLERGHRWTG 
LLDLPQTPDPQIfRPRLVEALAGAEDQVAVRADAVKARRIVPTPVTGAGPYTAPGGTIIi 
VTGGTAGLGAVTARWLAERGAEHLALVSRRGPGTAGVDEWRDIjTGLGVRVSVHSCDV 
GDRESVGALVQELTAAGDVVRGVVHAAGLPQQVPIiTDMDPADLADWAVKVDGAVHIA 
DLCPEAELFLLFSSGAGVWGSARQGAYAAGNAFLDAFARHRRDRGLPATSVAWGLWAA 
GGMTGDQEAVSFLRERGVRPMSVPRALEALERVLTAGETAWVADVDWAAFAESYTSA 
RPRPLLHRLVTPAAAVGERDEPREQTLRDRLAAIiPRAERSAELVRLVRRDAAAVLGSD 
AKAVPATTPFKDLGFDSLAAVRFRNRLAAHTGLRLPATLVFEHPNAAAVADLLHDRLG 
EAGEPTPVRSVGAGLAALEQALPDASDTERVELVERLERMLAGLRPEAGAGADAPTAG 
DDLGEAGVDELLDALERELDAR" (SEQ ID NO: 13) 

12505 . . 13470 
/gene="megAI " 
/function="AT-L" 
13576 . . 13791 
/gene^'megAI" 
/function="ACP-L" 
13849 . .15126 
/gene^'megAI M 
/function^ "KS1" 
15427 . . 16476 
/gene="megAI " 
/function= n ATl H 
17155. . 17694 
/gene="megAI " 
/function="K3U u 
17947 . . 18207 
/genes "megAI " 
/function="ACPl" 
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18268. .19548 
/ gene= " megAI n 



/function=" KS2" 



misc feature 



19876. .20910 
/gene = "megAI" 



/function="AT2 



misc feature 



21517. .22053 
/gene= "megAI " 



/function= ,, KR2" 



misc feature 



22318 . .22575 
/ genes: « megAI " 



gene 



CDS 



/function= 11 ACP2 
22867. .33555 
/gene="megAII " 
22867. .33555 



it 



/gene="megAli M 
/note="polyketide synthase" 
/codon__start=l 
/ trans l_table=ll 

/product="megalomicin 6-deoxyerythronolide B synthase 2" 

/translation= lT MTDNDKVAEYLRRATLDLRAARKRLRELQSDPIAWGKACRLPG 

GVHLPQHLWDLLRQGHETVSTFPTGRGWDIiAGLFHPDPDHPGTSYVDRGGFIiDDVAGF 

DAEFFGISPREATAMDPQQRLLLETSWELVESAGiDPHSLRGTPTGVFLGVARLGYGE 

NGTEAGDAEGYSVTGVAPAVASGRISYALGLEGPSISVDTACSSSLVALHLAVESLRL 

GESSLAWGGAAVMATPGVFVDFSRQRALAADGRSKAFGAAADGFGFSEGVSLVLLER 

LSEAESNGHEVLAVIRGSALNQDGASNGLAAPNGTAQRKVIRQALRNCGLTPADVDAV 

EAHGTGTTLGDPIEANALLDTYGRDRDPDHPLWLGSVKSNIGHTQAAAGVTGLLKMVL 

ALRHEELPATLHVDEPTPKVDWSSGAVRLATRGRPWRRGDRPRRAGVSAFGISGTNAH 

VIVEEAPERTTERTVGGDVGPVPLWSARSAAALRAQAAQVAELVEGSDVGLAEVGRS 

LAVTRARHEHRAAWASTRAEAVRGLREVAAVEPRGEDTVTGVAETSGRTVVFLFPGQ 

GSQWVGMGAELLDSAPAFADTIRACDEAMAPIiQDWSVSDVLRQEPGAPGLDRVDWQP 

VLFAVMVSLAPXWQSYGVTPAAWGHSQGEIAAAHVAGALSLADAARLVVGRSRLLRS 

LSGGGGMSAVALGEAEVRRRLRSWEDRISVAAVNGPRSVWAGEPEALREWGREREAE 

GVRVREIDVDYASHSPQIDRVRDELIiTVTGEIEPRSAEITFYSTVDVRAVDGTDLDAG 

YWYRNLRETVRFADAMTRLADSGYDAFVEVSPHPVWSAVAEAVEEAGVEDAVWGTIi 

SRGDGGPGAFLRSAATAHCAGVDVDWTPALPGAATIPLPTYPFQRKPYWLRSSAPAPA 

SHDLiAYRVSWTPITPPGDGVLDGDWLVVHPGGSTGWDGLAAAITAGGGRVVAHPVDS 

VTSRTGLAEALARRDGTFRGVTjSWATDERKVEAGAVALLTLAQALGDAGIDAPLWCIi 

TQEAVRTPVDGDLARPAQAALHGFAQVARLEIjARRFGGVLDIjPATVDAAGTRLVAAVL 

aggg ed wavrgdr lygr rlvratlpp pggg ft phgtvlvtgaag p vggrlarwlaer 
gatr-lvlpgahpgeelltairaagatawcepeaealrtaiggelptalvhaetltnf 
agvadadpedfaatvaaktalptvlaevlgdhrlerevycssvagvwggvgmaayaag 
sayldalvehrrarghasasvawtpwalpgavddgrlrerglrsldvadalgtwerll 
ragavsvavadvdwsvftegfaairptplfdell0rrgdpdgapvdrpgepagewgrr 
iaalspqeqretlltlvgetvaevlghetgteintrrafselgldslgsmalrqri^aa 
rtglrmpaslvfdhptvtalarylrrlwgdsdptpvrvfgptdeaepvawgigcrf 
pggiatpedlwrwsegtsittgfptdrgwdlrrlyhpdpdhpgtsyvdrggfldgap 
d fd pgf fg i tpre alamd pqqrlittie i aweave rag i d p etllgsdtgvf vgmngqs y 
lqlltgegdrlngyqglgnsasvlsgrvaytfgwegpaltvdtacssslvaihlamqs 
lrrgecslalaggvtvmadpytftofsaqrglaadg^^ 

leplskarrnghqvlavlrgsavnqdgasnglaapngpsqervirqaltasglrpadv 
drweahgtgtelgdpieagaiilaaygrdrdrplwlgsvktnightqaaagaagvikav 
lamrhg vl prs lhadels phi dw adg kve vlire arqw p pg e r p rrag vs s fgvsgtn a 

HVIVTIEAPAEPDPEPVPAAPGGPLPFVLHGRSVQTVRSQARTIjAEHLRTTGHRDIjADT 
ARTLATGRARFDVRAAVLGTDREGVCAALDAXAQDRPSPDVVAPAVFAARTPVLVFPG 
QGSQWVGMARDLLDSSEVFAESMGRCAEALSPYTDWDLLDVVRGVGDPDPYDRVDVLQ 
P^FAVMVSLARLWQSYGNrrPGAWGHSQGEIAAAHVAGALSLADAA^WALRSRVLR 
ELDDQGGMVSVGTSRAELDSVLRRWDGRVAVAAVNGPGTLWAGPTAELDEFLAVAEA 
REMRPRRIAVRYASHSPEVARVEQRLAAELGTVTAVGGTVPLYSTATGDLLDTTAMDA 
GYWYRNLRQPVIjFEHAVRSLLERGFETFIEVSPHPVLLriAVEETAEDAERPVTGVPTL 
RRDHDGPSEFLRNLLGAHVHGVDVDLRPAVAHGRLVDLPTYPFDRQRLWPKPHRRADT 
SSLGVRDSTHPLLHAAVDVPGHGGAVFTGRLiSPDEQQWLTQHVVGGRNLVPGSVLVDL 
ALTAGADVGVPVL.EELVLQQPLVLTAAGALLRLSVGAAD.EDGRRPVEIHAAEDVSDPA 
EARWSAYATGTLAVGVAGGGRDGTQWPPPGATALTLTDHYDTLAELGYEYGPAFQALR 
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AAWQHG DWYAEVS LD AVE EG Y AFDP VLLD AVAQT FGLTS RAPG KLP F AWRG VTLHAT 
GATAVRWATPAGPDAVALRVTDPTGQLVATVDALWRDAGADRDQPRGRDGDLHRLE 
WVRLATPDPTPAA^AmVAADGLDDLLRAGGPAPQAVWRYRPDGDDPTAEARHGVIiWA 
ATLVRRWLDDDRWPATTLWATSAGVEVSPGDDVPRPGAAAVWGVLRCAQAESPDRFV 
LVDGDPETPPAVPDNPQLAVRDGAVFVPRLTPLAGPVPAVADRAYRLVPGNGGSIEAV 
AFAPVPDADRPLAPEEWVAVmATGVNFRDVLLALGMYPEPAEMGTEASGVVTEVGSG 
VRRFTPGQAVTGLFQGAFGPVAVADHRLIiTPVPDGWRAVDAAAVPIAFTTAHYALHDL 
AGLQAGQS VLVHAAAGGVGMAAVALARRAGAEVFATAS PAKHPTLRALGLDDDH I ASS 
RESGFGERFAARTGGRGVDWLNSLTGDLLDESARLLADGGVFVEMGKTDLRPAEQFR 
GRYVPFDLAEAGPDRLGEILEEWGLLAAGALDRLPVSVWELSAAPAALTHMSRGRHV 
GKLVLTQPAPVHPDGTVLVTGGTGTLGRLVARHLVTGHGVPHLLVASRRGPAAPGAAE 
LRAD VEGLGAT I E I VACDTADREALAALLDS I P ADRPLTG WHTAG VLADGLVTS I DG 
TATDQVLRAKVDAAWHLHDLTRDADLSFFVLFSSAASVLAGPGQGVYAAANGVIjNALA 
GQRRALGLPAKALGWGLWAQASEMTSGLGDRIARTGVAALPTERALALFDAALRSGGE 
VLFPLSVDRSALRRAEYVPEVLRGAVRSTPRAANRAETPGRGLLDRLVGAPETDQVAA 
IoAELVRSHAAAVAGYDSADQLPERKAFKDLGFDSLAAVELRNRLGVTTGVRLPSTtiVF 
D H PT PL AVAEHLRS ELF AD S APD VGVGARLDDLERAIjDAIi PD AQGHAD VG ARLE ALLR 
RWQ S RRPPETE P VT I S DDAS DDELFSMLDRRLGGGGD V " (SEQ ID NO: 14) 
miscjeature 22957 . .24237 

/gene= "megAI I " 
/function="KS3" 
miscjeature 24544 25581 

/gene= " megAI I ° 
/function="AT3" 
misc_f eature 26230. .26733 

/gene = " megAI I " 
/function= n KR3 (inactive)" 
miscjeature 26998.. 27258 

/gene-" megAII " 
/function="ACP3" 
miscjeature 27393 . .28590 

/gene=" megAII" 
/function="KS4" 
miscjeature 28897.. 29931 

/gene-"megAII " 
/function="AT4" 
miscjeature 2 9953 ... 30477 

/gene="megAII " 
/function= ,, DH4" 
miscjeature 31396 . .32244 

/gene= "megAII " 
/function="ER4 M 
miscjeature 32257 . .32799 

/gene= "megAII " 
/function= " KR4 » 
miscjeature 33052.. 33312 

/gene= "megAII " 
/function="ACP4" 
gene 33666.. 43271 

/gene = M megAII I " 
CDS 33666 . . 43271 

/gene= " megAI II" 
/note=*"polyketide synthase" 
/ codon^s t ar t = 1 
/transl Jable=ll 

/products "megalomicin 6 -decocyerythronolide B synthase 3" 

/ translation^ "MSESSGMTEDRLRRYLKRTVAELDSVTGRLDEVEYRAREPIAVV 

GMACRFPGGVDSPEAFWEFIRDGGDAIAEAPTDRGWPPAPRPRLGGLLAEPGAFDAAF. 

FG I S PRE ALATDPQQRLMLE I S WE ALERAGFDPS SLRGS AGGVFTG VG AVDYGPRPDE 

APEEVXGYVGIGTASSVASGRVAYTLGLEGPAVTVXITACSSGLTAVHLAMESLRRDEC 

TLV1AGGVTVMSSPGAFTEFRSQGGLAEDGRCKPFSRAADGFGLAEGAGVLVLQRLSV 

ARAEGRPVLAVLRGSAINQDGASNGLTAPSGPAQRRVIRQALERARLRPVDVDYVEAH 

GTGTRLGDPIEAHALLDTYGADREPGRPLWVGSVBCSNIGHTQAAAGVAGVMKTVLALR 

HREIPATL'HFDEPSPHVDWDRGAVSWSETRPWPVGERPRRAGVSSFGISGXNAHVIV 
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EEAPSPQAADLDPTPGPATGATPGTDAAPTAEPGAEAVALVFSARDERALRAQAARLA 

DRLTDDPAPSLRDTAFTLVTRPJVTWEKRAVVVGGGEEVLAGLRAVAGGRPVDGAVSGR 

ARAGRRWLVFPGQGAQWQGMARDLLRQSPTFAESIDACERALAPHVDWSLREVLDGE 

QSLDPVDWQPVLFAVt-tVSLARLWQSYGVTPGAWGHSQGEIAAAHVAGALSLADAAR 

WALRSRVLRRLGGHGG^.SFGLHPDQA.AERIARFAGAI^TVASVNGPRS\AT J AGENGP 

LDELIAECEAEGVTARRIPVDYASHSPQVESLREELLAALAGVRPVSAGIPLYSTLTG 

QVIETATMDADYWFANLREPVRFQDATRQLAEAGFDAFVEVSPHPVLTVGVEATLEAV 

LPPDADPCVTGTLRRERGGLAQFHTALAEAYTRGVEVDWRTAVGEGRPVDLPVYPFQR 

QNFWLPVPLGRVPDTGDEWRYQLAWHPVDLGRSSLAGRVLWTGAAVPPAWTDWRDG 

LEQRGATWLCTAQSRARIGAALDAVDGTALSTWSLLALAEGGAVDDPSLDTLALVQ 

ALGAAGIDVPLWLVTRDAAAVTVGDDVDPAQAMVGGLGRWGVESPARWGGLVDLREA 
DAD S AR S LAAI LAD P RG E EQ F A I R PDG VT V ARL V PA P ARAAGTRWT P RGTVL VTGGTG 

GIGAHLARWLAGAGAEHLVLLNRRGAEAAGAADLRDELVALGTGVTITACDVADRDRL 
AAVLD AARAQGR WT AVF HAAG I S RSTAVQE LTE S E FTE I TD AKVRGTANLAELC PEL 

DALVLFSSNAAVWGSPGLASYAAGNAFLDAFARRGRRSGLPVTSIAWGLWAGQNMAGT 

EGGDYLRSQGLRAMDPQRAIEELRTTLDAGDPWVSWDLDRERFVELFTAARRRPLFD 

ELGGVRAGAEETGQESDLARRLASMPEAERHEHVARLVRAEVAAVLGHGTPTVIERDV 

AFRDLGFDSMTAVDLRNRLAAVTGVRVATTIVFDHPTVDRLTAHYLERLVGEPEATTP 

AAAWPQAPGEADEPIAIVGMACRLAGGVRTPDQLWDFIVADGDAVTEMPSDRSWDLD 

ALFDPDPERHGTSYSRHGAFLDGAADFDAAFFGISPREALAMDPQQRQVLETTWELFE 

NAGIDPHSLRGTDTGVFLGAAYQGYGQNAQVPKESEGYLLTGGSSAVASGRIAYVLGL 

EGPAITVDTACSSSLVALHVAAGSLRSGDCGLAVAGGVSVMAGPEVFTEFSRQGALAP 

DGRCKPFSDQADGFGFA^GVAVVLLQRLSVAVREGRRVLGWVGSAVNQDGASNGLAA 

PSGVAQQRVIRRAWGRAGVSGGDVGWEAHGTGTRLGDPVELGALLGTYGVGRGGVGP 

VWGSVKANVGHVQAAAGWGVIKWLGLGRGLVGPMVCRGGLSGLVDWSSGGLVVAD 

GVRGWPVGVDGVRRGGVSAFGVSGTNAHVWAEAPGSWGAERPVEGSSRGLVGVAGG 

WPVVLSAKTETALTEI^ARRLHDATODTVALPAVAATLATGRAHLPYRAAJ^LARDHDE 

LRDRLRAFTTGSAAPGWSGVASGGGWFVFPGQGGQWVGMARGLLSVPVFVESV\fiSC 

DAWSSWGFSVLGVLEGRSGAPSLDR\^WQPVLFVV>1VSLARLWRWCGVVPAAVVG 

HSQGEIAAAWAGVLSVGDGARWALRARALRALAGHGGMVSLAVSAERARELIAPWS 

DRISVAAVNSPTSWVSGDPQALAALVAHCAETGERAKTLPVDYASHSAHVEQIRDTI 

LTDLADVTARRPDVALYSTLHGARGAGTDMDARYWYDNLRSPVRFDEAVEAAVADGYR 

VFVEMSPHPVLTAAVQEIDDETVAIGSLHRDTGERHLVAELARAHVHGVPVDWRAILP 

ATHPVPLPNYPFEATRYWLAPTAADQVADHRYRVDWRPLATTPAELSGSYLVFGDAPE 

TLGHSVEKAGGLLVPVAAPDRESLAVALDEAAGRLAGVLSFAADTATHLARHRLLGEA 

DVEAPLWLVTSGGVALDDHDPIDCDQAI1VWGIGRVMGLETPHRWGGLVDVTVEPTAED 

GWFAALLAADDHEDQVALRDGIRHGRRLVRAPLTTRNARWTPAGTALVTGGTGALGG 

HVARYLARSGVTDLVLLSRSGPDAPGAAELAAELADLGAEPRVEACDVTDGPRLRALV 

QELREQDRPVRIWHTAGVPDSRPLDRI DELES VSAAKVTGARLLDELCPDADTFVLF 

SSGAGWGSANLGAYAAANAYLDALAKRRRQAGRAATSVAWGAWAGDGMATGDLDGLT A 

RRGLRAMAPDRALRACTRRWTTHDTCVSVADVDWDRFAVGFTAARPRPLIDELVTSAP '' 

VAAPTAAAAPVPAMTADQLLQFTRSHVAAILGHQDPDAVGLDQPFTELGFDSLTAVGIi 

RNQLQQATGRTLPAALVFQHPTVRRLADHLAQQLDVGTAPVEATGSVLRDGYRRAGQT 

GDVRSYLDLLANLSEFRERFTDAASLGGQLELVDLADGSGPVTVICCAGTAALSGPHE 

FARLASALRGTVPVRALAQPGYEAGEPVPASMEAVLGVQADAVLAAQGDTPFVLVGHS 
AGALNUVYALATELADRGHPPRGVVLLDVYPPGHQEAVHAWLGELTAALFDHETVRMDD 

TRLTALGAYDRLTGRWRPRDTGLPTLWAASEPMGEWPDDGWQSTWPFGHDRVTVPGD 

HFSMVQERADAI ARHIDAWLSGERA " (SEQ ID NO: 15) 
misc_f eature 33780 . . 35027 

/gene="megAIII " 

/function= M KS5 M 
misc_f eature 35385 .. 36419 

/gene="megAIII " 

It unction="AT5" 
misc_feature 37068. .37604 

/gene="megAIII " 

/£ unction= ,, KR5" 
misc_f eature 37860. .38120 

/gene^'megAIII " 

It unction^'AGPS" 
misc_f eature 38187. .39470 

/gene= "megAI I I " 

/function="KS6 n 
misc feature 39795.-40811 
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/ gene = " meg AI II" 

/function= M AT6 M 
misc_f eature 41406 . .41936 

/gene = megAI 1 1 " 

/function= ,! KR6" 
misc_f eature 42168.. 42425 

/ gene = " megAI II" 

/ f unc t ion= " ACP6 " 
misc_£eature 42585. .43271 

/gene = 11 megAI 1 1 " 

/function="TE" 
gene 43268 . .44344 

/gene= ,, megCII" 
CDS 43268 . . 44344 

/gene="megCII" 

/codon_start=l 

/ trans l_table-ll 

/product="TDP-4 -keto-6-deoxyglucose 3, 4-isomerase" 

/ 1 rans la t ion = ■• MNTTDRAVLGRRLQMIRGLYWGYGSNGDP YPMLLCGHDDDPHRW 

YRGLGGSGVRRSRTETl^SArrDHATAVRVLDDPTFTRATGRTPEWMRAAGAPASTWAQP 

FRDVHAASWDAELPDPQEVEDRLTGLLPAPGTRLDLVRDLAWPMASRGVGADDPDVLR 

AAWDARVGLDAQLTPQPIiAVTEAAIAAVPGDPHRRALFTAVEMTATAFVDAVLAVTAT 

AGAAQRLADDPDVAARIiVAEVLRLHPTAHLERRTAGTETWGEHTVAAGDEVWVVAA 

AMRDAGVFADPDRIiDPDRADADRALSAQRGHPGRLEEIi\AA/LTTAALRSVAKALPGLT 

AGGPWRRRRSPVLRATAHCPVEL" (SEQ ID NO: 16) 

gene 44355 . .45623 

/gene^'megCIII" 

CDS 44355 . .45623 

/gene = ,, megCIII M 
/codon_start=l 
/ trans l_table=s 11 

/products "TDP-desosamine glycosyltransf erase" 
/translation "MRVVFSSMASKSHLFGLVPLAWAFRAAGHEVRVVAS PALTDDIT 
"AAGLTAVPVGTDVBLVDFMTHAGYDIIDYVRSLDFSERDPATSTWDHLLGMQTVTiTPT 
FYAIiMSPDSLVEGMISFCRSWRPDWSSGPQTFAASIAAWTGVAHARIjLWGPDITVRA 
RQKFLGLLPGQPAAHREDPLAEWLTWSVERFGGRVPQDVEELWGQWTIDPAPVGMRL 
DTGLRTVGMRYVDYNGPSWPDWLHDEPTRRRVCLTIjGISSRENSIGQVSVDDLLGAL 
GDVDAEIIATVDEQQLEGVAHVPANIRTVGFVPMHALLPTCAATVHHGGPGSWHTAAI 
HGVPQVILPDGWDTGVRAQRTEDQGAGIALPVPELTSDQLREAVRRVLDDPAFTAGAA 
RMRADMLAEPSPAEVVDVCAGLVGERTAVG" (SEQ ID NO: 17) 

gene 45620 . . 46591 

/gene="megBII" 
CDS 45620 . .46591 

/gene="megBir' 

/codon_start=l 

/transl__table=ll 

/product^ "TDP-4 -keto- 6 -deoxyglucose 2,3 dehydratase" 

/ 1 rans lat ion= " MSTDATHVRLGRCALLTSRLWLGTAALAGQDDADAVRLLDHARS 

RGVNCLDTADDDSASTSAQVAEESVGRWLAGDTGRREETVLSVTVGVPPGGQVGGGGL 

SARQIIASCEGSLRRLGVDHVT)VXHLPRVDRVEPWDEVWQAVDALVAAGKVCYVGSSG 

FPGWHIVAAQEHAVRRHRLGLVSHQCRYDLTSRHPELEVLPAAQAYGLGVFARPTRIiG 

GLLGGDGPGAAAARASGQPTALRSAVEAYEVFCRDLGEHPAEVALAWVLSRPGVAGAV 

VG ARTPG RIiDS ALRACG VAIjG ATELTAJODG I FPG VAAAGAAPEAWLR " (SEQ ID NO: 18) 

gene complement (46660 .. 47403) 

/gene= "rnegH" 

CDS complement (46660 .. 47403 ) 

/gene="megH" 

/note="putative thioesterase" 
/codon^s tart=l 
/transl_table=ll 
/product ="TEII M 

/ translat ion= "MNTWLRRFGSADGHRARLYCFPHAGAAADSYLDLARALAPEVDV 
WAVQYPGRQDRRDERALGTAGEIADEVAAVLRDLVGEVPFALFGHSMGALVAYETARR 
LEARPGVRPLRLFVSGQTAPRVHERRTDLPDEDGLVEQMRRLGVSEAALADQGLLDMS 
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LPVLRADHRVLRS YAWQAG PPLRAG ITTLCGDTD PLTT VED AQRWLP YS VVPGRTRTF 

PGGHFYLADHVGEVAESVAPDIiLRLTPTG" (SEQ ID NO: 19) 
gene complement (47411 >4798l ) 

/gene = f, megF" 
CDS complement (47411. . >47980) 

/gene="megF" 

/codon_start=l 

/ trans l_table=ll 

/product="C-6 hydroxylase" 

/translation=" IRVQDDDADRLSRDELTSIALVLLLAGFEASVSLIGIGTYLLLT 
HPDQLALVRKDPALIiPGAVEEILRYQAPPETTTRFATAEVEIGGVTIPAYSTVLIANG 
AAMRDPGQFPDPDRFDVTRDSRGHLTFGHG1HYCMGRPLAKLEGEVALGALFDRFPKL 
SLGFPSDEWWRRSLLLRGIDHI>PVRPNG n (SEQ ID NO; 20) 

BASE COUNT 5962 a 16875 c 18045 g 7099 t 

ORIGIN 

1 ctcgagccga tgctcggcgg cgcggtgggc caaccagtcg tggacgtcgt cggtggcggt 
61 gggaggtccg ccgtgccgag tcaggaaacg tattgccgat tgtgtggatt ccggagtcgc 
121 atgaccgttg acccgatccc ccatacgcct ctcccgtgat gtcgtgggcg gtccgtgcgg 
181 taccgcccgg actgacattc gtcgatcaag accccgccca gtgtagggct ccgcccgcga 
241 cgggagaagg tccgtcgaac aacttccggg tgaccggtcg ccggcgtcgg tgaaacgggc 
301 gtcggagcac ccgatcattg ctgtcggtga acttcctaac tgtcggcgcg cacatctttc 

3 61 tgaccggtgt gttccgtggt atgacgcgtt cccggcccgt ctggaactgt gcgtgggact 
421 gaccggttgc ggcgtgtttt cgcccgtttc cgaactgcgg .attcgtcgat cgcgcaggtg 

4 81 ggagcgggtg gctgaccggg atgatctgca atcatggcgc tcaatgacga tctcttgtag 
541 catggtccgc gccgagggtc cgacaggccc gaaacgcccg gcatccagcc tgttcgacga 
601 cgtcgacatc accgtgcaag ccgcgatgac accgacacca cgccatgctg gtgccgcact 
661 ggaagggtgg cgcgatcagg gaaatggccg tgtcactaga cagacgccaa acagctgtcc 
721 gggcctgcgg aaacagcatc gatctgcgtc agccgttcat tgccccggcg gcaccgcctt 
781 ggaaatccgt gccaccggtc gtccgcagtg acgatcgcgg acccgggttt cgagacagca 
841 ggtagtaggc gatgcaggcg tttcgtctcg cgccggacgc gtcgcactag gtggaatccg 
901 tcacagtctt caatccggga gcgttctatg gcagttggcg atcgaaggcg gctgggccgg 
961 gagttgcaga tggcccgggg tctctactgg gggttcggtg ccaacggcga tctgtactcg 

1021 atgctcctgt ccggacggga cgacgacccc tggacctggt acgaacggtt gcgggccgcc 
1081 ggacggggac cgtacgccag tcgggccgga acgtgggtgg tcggtgacca ccggaccgcc 
1141 gccgaggtgc tcgccgatcc gggcttcacc cacggcccgc ccgacgctgc ccggtggatg 
1201 caggtggccc actgcccggc ggcctcctgg gccggcccct tccgggagtt ctacgcccgc 
1261 accgaggacg cggcgtcggt gacagtggac gccgactggc tccagcagcg gtgcgccagg 
1321 ctggtgaccg agctggggtc gcgcttcgat.. c tcgtgaacg acttcgcccg ggaggtcccg 
1381 gtgctggcgc tcggtaccgc gcccgcactc aagggcgtgg accccgaccg tctccggtcc 
1441 tggacctcgg cgacccgggt atgcctggac gcccaggtca gcccgcaaca gctcgcggtg 
1501 accgaacagg cgctgaccgc cctcgacgag atcgacgcgg tcaccggcgg tcgggacgcc 
1561 gcggtgctgg tgggggtggt ggcggagctg gcggccaaca cggtgggcaa cgccgtcctg 
1621 gccgtcaccg agcttcccga actggcggca cgacttgccg acgacccgga gaccgcgacc 
1681 cgtgtggtga cggaggtgtc gcggacgagt cccggcgtcc acctggaacg ccgcaccgcc 
1741 gcgtcggacc gccgggtggg cggggtcgac gtcccgaccg gtggcgaggt gacagtggtc 
1801 gtcgccgcgg cgaaccgtga tcccgaggtc ttcaccgatc ccgaccggtt cgacgtggac 
1861 cgtggcggcg acgccgagat cctgtcgtcc cggcccggct cgccccgcac cgacctcgac 
1921 gccctggtgg ccaccctggc cacggcggcg ctgcgggccg ccgcgccggt gttgccccgg 
1981 ctgtcccgtt ccgggccggt gatcagacga cgtcggtcac ccgtcgcccg tggtctcagc 
2041 cgttgcccgg tcgagctgta gaggaagaac gatgcgcgtc gtgttttcat cgatggctgt 
2101 caacagccat ctgttcgggc tggtcccgct cgcaagcgcc ttccaggcgg ccggacacga 
2161 ggtacgggtc gtcgcctcgc cggccctgac cgacgacgtc accggtgccg gtctgaccgc 
2221 cgtgcccgtc ggtgacgacg tggaacttgt ggagtggcac gcccacgcgg gccaggacat- 
2281 cgtcgagtac atgcggaccc tcgactgggt cgaccagagc cacaccacca tgtcctggga 
2341 cgacctcctg ggcatgcaga ccaccttcac cccgaccttc ttcgccctga tgagccccga 
2401 ctcgctcatc gacgggatgg tcgagttctg ccgctcctgg cgtcccgact ggatcgtctg 
2461 ggagccgctg accttcgccg ccccgatcgc ggcccgggtc accggaaccc cgcacgcccg 
2521 gatgctgtgg ggtccggacg tcgccacccg ggcccggcag agcttcctgc gactgctggc 
2581 ccaccaggag gtggagcacc gggaggatcc gctggccgag tggttcgact ggacgctgcg 
2641 gcgcttcggc gacgacccgc acctgagctt cgacgaggaa ctggtgctgg ggcagtggac 
2701 cgtggacccc atccccgagc cgctgcggat cgacaccggc gtccggacgg tgggcatgcg 
2761 gtacgtcccc tacaacggcc cctcggtggt gcccgcctgg ctgttgcggg aacccgaacg 
2 821 tcggcgggtc tgcctgaccc tcggcggttc cagccgggaa cacggcatcg ggcaggtctc 
2881 catcggcgag atgttggacg ccatcgccga catcgacgcc gagttcgtgg ccaccttcga 
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2941 cgaccagcag ttggtcggcg tgggcagcgt tccggcaaac gtccgtaccg ccgggttcgt 
3001 gccgatgaac gtcctgctgc ccacctgcgc ggccaccgtg caccacggcg gcaccggcag 
3061 ttggctgacc gccgccatcc acggcgtacc gcagatcatc ctctcggacg ccgacaccga 
3121 ggtgcacgcc aagcagctcc aggacctcgg cgcggggctg tcgctcccgg tcgcggggat 
3181 gaccgccgag cacctgcgtg gggcgatcga gcgggttctc gacgagccgg cgtaccgcct 
3241 cggtgcggag cggatgcggg acgggatgcg gaccgacccg tcgccggccc aggtggtcgg 
3301 catctgtcag gacctggccg ccgaccgggc ggcacgcggc aggcagccgc gtcgaaccgc 
33 61 cgagccgcac ctgccgcgat gacttccacc accaccggga ccggctgatg ccggtcccgg 
3421 aatccacacg ccgactttcc ttctgacacg agggggcccc ggtggttacc tccaccaact 
3481 tggacacgac agcacggccg gcactgaact cgttgaccgg gatgcggttc gtcgccgcct 
3541 tcctggtctt cttcacgcac gtcctgtcga ggctcatccc gaaeagctac gtgtacgccg 
3601 acggcctgga cgccttctgg cagaccaccg gacgggtggg ggtgtcgttc ttctttattc 
3661 tcagcggttt cgtgctgacc tggtcggcgc gggccagcga ctcggtgtgg tcgttctggc 
3721 gcagacgggt ctgcaagctc ttccccaacc acctggtcac cgccttcgcc gccgtggtgt 
3781 tgttcctggt caccgggcag gcggtgagcg gtgaggcgct gatcccgaac ctcctgctga 
3841 tccacgcctg gttcccggcc ctggagatct ccttcggcat caacccggtg agctggtcgt 
3901 tggcctgcga ggcgttcttc tacctgtgct tcccgctgtt cctgttctgg atctccggta 
3 961 tccgcccgga gcggctgtgg gcctgggccg ccgtggtgtt cgccgcgatc tgggcggtac 
4021 cggtggtcgc cgacctcctg ctgccgagtt ccccgccgct gatcccgggg cttgagtact 
4081 ccgccatcca ggactggttc ctctacacct tccctgcgac gcggagcctg gagttcatcc 
4141 tcgggatcat cctggcccgc atcctgatca ccggtcggtg gatcaacgtc gggctgctcc 
4201 ccgcggtgct gttgttcccg gtcttcttcg tcgcctcgct cttcctgccg ggtgtctacg 
4261 ccatctcctc gtcgatgatg atccttcccc tggttctgat catcgccagc ggcgcgacgg 
4321 ccgacctcca gcagaagcgc accttcatgc gtaaccgggt gatggtgtgg ctcggcgacg 
4381 tctccttcgc gctctacatg gtccacttcc tggtgatcgt ctacggggcg gacctgctgg 
4441 ggttcagcca gaccgaggac gccccgctgg gtctcgcact cttcatgatc attccgttcc 
4501 tcgcggtctc cctggtgctg tcgtggctgc tgtacaggtt cgtcgagcta cccgtcatgc 
4561 gtaactgggc ccgcccggcc tccgcccggc gcaaacccgc cacggaaccc gaacagaccc 
4621 cttcccgccg gtaagaagga cggtgcatcg gtgaccacct acgtctggtc ctatctgttg 
4681 gagtacgaga gggaacgagc cgacatcctc gatgcggtgc agaaggtctt cgccagtggc 
4741 agcctgatcc tcggtcagag tgtggagaac ttcgagaccg agtacgcccg ctaccacggg 
4801 atcgcgcact gcgtgggcgt cgacaacggc accaacgctg tgaaactcgc gctggagtcg 
4861 gtaggtgtcg gacgcgacga cgaggtcgtc acggtctcca acaccgccgc ccccacagtc 
4921 ctggccatcg acgagatcgg cgcccggccg gtcttcgtgg acgtccgcga cgaggactac 
4981 ctcatggaca ccgacctggt ggaggcggcg gtcaccccgc gtaccaaggc catcgtcccg 
5041 gtgcacctgt acgggcagtg cgtggacatg acagccctgc gggaactggc cgaccggcgg 
5101 ggcctcaagc tcgtggagga ctgcgcccag gcccacggtg cccggcggga cggtcggctg 
5161 gccgggacga tgagcgacgc ggcggccttc tcgttctacc cgacgaaggt cctcggcgcc 
5221 tacggcgacg gcggcgcggt cgtcaccaac gacgacgaga cagcccgcgc cctgcgacgg 
5281 ctgcggtact acgggatgga ggaggtctac tacgtcaccc ggaccccggg tcacaacagc 
5341 cgcctcgacg aggtgcaggc cgagatcctg cggcgcaaac tgacccggct cgacgcgtac 
5401 gtcgcgggtc ggcgggcggt cgcccagcgg tacgtcgacg ggctcgccga cctccaagac 
5461 tcgcacggcc tcgaactccc agtggtcacc gacggcaacg aacacgtctt ctacgtgtac 
5521 gtcgtccgcc acccgcgccg cgacgagatc atcaagcgtc tccgggacgg gtacgacatc 
5581 tccctgaaca tcagctaccc ctggccggtg cacaccatga ccggcttcgc ccacctcggt 
5641 gtcgcgtcgg ggtcgctgcc ggtcaccgaa cggctggccg gcgagatctt ctcccttccc 
5701 atgtacccct ccctccctca cgaccfcgcag gacagggtga tcgaggcggt gcgggaggtc 
5761 atcaccgggc tgtgacgagc ccgcgtgtcg tcagcgaaga cccactctgg aagggccggt 
5821 catgccgaac agccactcga ccacgtcgag caccgacgtc gccccgtacg agcgggcgga 
5881 catctaccac gacttctacc acggccgtgg caagggatac cgtgccgaag ccgacgcgct 
5941 cgtggaggtc gcccgcaagc acaccccaca ggcggcgacc ctgctggacg tggcctgcgg 
6001 gaccggatcc cacctggtcg agctggcgga cagcttccgg gaggtggtgg gggtcgacct 
6061 gtcggccgcc atgctcgcca ccgccgcccg caacgacccc gggcgggaac tgcaccaggg 
6121 cgacatgcgc gacttctccc tcgaccgcag gttcgacgtc gtcacctgca tgttcagctc 
6181 caccggttac ctcgtcgacg aggccgaact ggaccgtgcc gtggcgaacc tggccggtca 
6241 cctcgcgcct ggcggcaccc tcgtcgtgga gccctggtgg ttcccggaga cgttccggcc 
6301 cggctgggtc ggggccgacc tggtcaccag cggtgaccgg aggatctccc ggatgtcgca 
6361 caccgtcccg gcgggtctgc ccgaccgcac cgcctcccgg atgaccatcc actacacggt 
6421 ggggtcaccg gaggccggga tcgagcactt caccgaggtg cacgtgatga ccctgttcgc 
64 81 ccgcgccgcc tacgagcagg ccttccagcg ggcgggcctg agctgctcgt acgtcggcca 
6541 cgacctgttc tcgccgggcc ttttcgtcgg ggtcgccgcg gagccggggc ggtgagggtc 
6601 gaggagctgg gcatcgaggg ggtcttcacc ttcaccccgc agacgttcgc cgacgagcgg 
6661 ggggtgttcg gcacggcgta ccaggaggac gtgttcgtgg cggcgctcgg ccgcccgctg 
6721 ttcccggtgg cccaggtcag caccacccgg tcccggcggg gtgtggtccg gggggtgcac . 
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6*781 ttcacgacga tgcccggctc catggcgaag tacgtctact gcgccagggg tagggcgatg 

6841 gacttcgccg tcgacatccg gcccggttcc ccgaccttcg gccgggccga gccggtcgag 

6901 ctctccgccg agtcgatggt cgggctgtac cttcccgtgg gcatgggcca cctgttcgtc 

6961 tccctggagg acgacaccac cctcgtctac ctgatgtccg ccggttacgt ccccgacaag 

7021 gaacgggcgg tgcaccccct ggatccggag ctggcgttgc cgatcccggc cgacctcgac 

7081 ctcgtcatgt ccgagcggga ccgggtcgca cccaccctcc gggaggcccg ggaccagggg 

7141 atcctgcccg actacgccgc ctgccgggcc gccgcgcacc. gggtggtgcg. gacgtgaccc 

7.201 cggccgggcg tgcgggccgg tggtggtgct cggcgcgtcg ggtttcctgg gttcggcggt 

7261 cacccacgcc ctggccgacc tcccggtgcg ggtgcggctc gtcgcccggc gggaggtcgt 

7321 cgtgccctcc ggtgccgtcg ccgactacga gacgcaccgg gtggacctca ccgaacccgg 

7381 agcgctcgcg gaggtggtcg cggacgcccg ggcggtcttc ccgttcgccg cccagatcag 

7441 gggtacgtca gggtggcgga tcagcgagga cgacgtggtc gccgaacgga cgaacgtcgg 

7S01 cctggtccgg gacctgatcg ccgtcctgtc ccgctcgccg cacgccccgg tggtggtctt 

7561 cccgggcagc aacacgcagg tcggcagggt caccgccggc cgggtcatcg acggcagcga 

7621 gcaggaccac cccgagggcg tctacgacag gcagaaacac accggggaac agctgctcaa 

7681 ggaggccact gcggccgggg cgatccgggc gaccagtctg cggctgcccc cggtgttcgg 

7741 ggtgcccgcc gccggcaccg ccgacgaccg gggggtggtc tccaccatga tccgtcgggc 

7801 cctgaccggc caaccgctga cgatgtggca cgacggcacc gtccggcgtg aactgctgta 

7861 cgtgaccgac gccgcccggg ccttcgtcac cgccctggac cacgccgacg cgctcgccgg 

7921 acgccacttc ctgttgggga cggggcgttc ctggccgctg ggcgaggtct tccaggcggt 

7 981 ctcgcgcagc gtcgcccggc acaccggcga ggacccggtg ccggtggtct cggtgccgcc 

8041 tccggcgcac atggacccgt cggacctgcg cagcgtggag gtcgaccccg cccggttcac 

8101 ggctgtcacc gggtggcggg ccacggtcac gatggcggag gcggtcgacc ggacggtggc 

8161 ggcgttggcc ccccgccggg ccgccgcccc gtccgagccc tcctgaccgg ggtcacccgg 

8221 gttcgtccta cggcaccggc ccgtcgacgg ccggtgccgg gaagatcgct tcgagttccc 

8281 ggagttcctc ctcgcccagc gtcagctcgg cggcccgtaa cgccgagtcg agctgctcgg 

8341 gtgtgcgggg gccgatgaca gcgcccagga tcccggggcg ggacaggacc caggccagac 

8401 cgacctcggc cgggtccgcg ccgaggcgtc ggcagtagtc ctcgtacgcc tcgacgaggg 

8461 ggcgtacggc ggggaggagc acctgggcgc gtccctgcgc cgacttgacg gcggttccgg 

8521 ctgccaactt ctccagtacg ccgctgagca gcccgccgtg caggggggac caggcgaaca 

8581 cgcccacccc gtacgcctgg gcggcgggca ggacgtccag ctcggggtgg cggacggcca 

8641 ggttgtacag gcactggtgg gagatcatgc cgagcaggtt gcggcgtgcc gcgctctcct 

8701 gggcggcggc gatgtgccag cccgccaggt tggaggagcc gacgtacccg accttcccac 

8761 tgccgaccag atgttcggcg gcctgccaca cctcgtccca cggtgcggcg cggtcgatgt 

8821 ggtgcgtctg gtagatgtcg atgtggtcga ccccgaggcg gcggagggag ttctcgcagg 

8881 cggcgacgat gtgtcgggcg gagagcccgc cgtcgttgac ccgttcgctc atctcgctgc 

8941 ccaccttggt cgccaggacg gtctcctcgc gtcgacctcc gccctgggcg aaccaccgtc 

.9001 cgacgagttc ctcggtgtgg cccttgtaga gccgccagcc gtagatgtcg gcggtgtcga 

9061 tgcagttgac gccccgctcg agggcgtggt ccatcagccg cagcgcgtcg tcgtcggtca 

9121 cccgtccact gaagttcacg gtgccgagcc agagtcggct ggtgtgcaac gccgatcgtc 

9181 cgacgcgtac ccgggcggac ccggccccgg tggttcccac gtcggtcacc tgtcggcgcg 

9241 gtgctggtgg gcgagcgcct ccagcacggg tacgacctcg gcgggggtcg gcgcggccag 

9301 cgcctcctgc cgcagcttct cggcgttctc ggcgtgggaa cggtcctcga ccactgtggc 

9361 gagagcctgc cagagggtgt cggcgtcgac ctcgtccgga cggaggaaga cacccgctcc 

9421 cagctcggcg gtgcgctgac cacgcaggac acagtcccac tcgtgggcga cggagatctg 

9481 cggtacgccg tggtgcagcg cggtggccca gcttccggca ccgccgtggt ggatgacggc 

9541 ggcacagccc ggcagcagga tgttcatggg aacgaagtcc accaggcgga cgttgtccgg 

9601 caccgacgcc ggatcgagcc cggagcgggt caccacgatc tcgccgtcga accgcgcgag 

9661 ggtggccagt gtccggagga actcctgcgg gttcgaggtg atgcccagcg ccgagtatcc 

9721 cccggtgaag cagacccggc ggactccgtc cgaggtcctg agccactgcg gcacgacgga 

9781 ggacccgttg tagggcaaag tccgggtgtg caccgactcc agtccggtct ccaggcggaa 

9841 gctctcgggc agctggtcga cgctccactg tccgacagcg aggtcctcgc tgtagtcgag 

9901 gccgaaccgg ccggcgacct cggtgagcca gccgccgagc gggtccggcc ggtcgtcggc 

9961 gggacgctgc ccgcgcaggt cctgggagcg gctgcggaag tagccggtga ggtcgctgcc 

10021 ccacagcagc cgggcgtggg cggccccgca ggccttggcc gcgaccgccc cggcgaaggt 

10081 gaagggctcc cagagcacca ggtcgggacg ccagtccatg gcgaactcga cgagttcgtc 

10141 gacgaaggag tcgttgttga ccaccgggaa gacgaaccgg gaggtggcct cctcgatgcc 

10201 gtgcaggaac tcccacgagc gcagttccgg tccgcgtcgg gcgaagtcca ggtcggtggt 

10261 gtagcggtgc acctgcgcgg cggcctcagg ggagatgtcg aagagtcggt ggtccgagcc 

10321 gagtggcacc gaggtcagtc ccgcgccgac gacgacgtcg gtgagctcgg gctgactggc 

10381 cacccggacg tcgtggccgg cggtgtgcag cgcccaggcc agggggacga ggccctggaa 

10441 gtgggtacgg tgcgcgaacg aggtgagcag gacccgcact ggtcactcct tggtcgagat 

10501 gagggcggca acggtccggt cgatgccctc ggccagcggc acccgggggt gccagccggt 

10561 cagcgtccgg aactcggtgg agtcgaagtc gtcgctgcgg aagtcgttgg cctcggcgtt 
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10621 ctccggtgga gggacgctga cgacgggcac cgcagggttg ccggtctgac gtgccacgct 
10681 ggcggcgacg gtctcgaaga tctcgccgag gggtcgggcc tcgtccgcgc tcggcgtcca 
10741 gacgtcgccg accagcgcct cgtggttgtg cagtgcggcg gtgaacgcgg tggccacgtc 
10801 ctcgacgtgc aggaggttgc ggcgcacgct gccctcgtgc cacatcgtga tcggctcacc 
10861 ggcgagggct cgccggatca tggcggtgac gacaccccgg ccggtctgcc ccgacgggcc 
: 10921 gctgtggccg tagatcgcgg gcaggcgcag gatcaccccg tcgacgaccc cgtcctcggt 
10981 ggcctgacgc aggatccgct cggcctcgat cttgtgctgg gcgtaccggc tgggggcggc 
11041 ggggttcgcg gcctgggtgg tgctggcgaa caggagcacc ggcgcgggtc cgggtcttgc 
.11101 ccgcagcgcg gcgacgaggt cgcgcatgat gcccgcgttg acgcgttcgg cctcgggcac 
11161 cgtggcggcg ctgcgccagg tcgacccgcc ggcggcgtag gcgaccagat gcacgacgac 
11221 gtcggtgtcg gcgacgacct gcgcgacccg gccgggttcg agcaggtcga ctcgaaggtg 
112 81 ctcgatcccg gcgctgcctg gtggctggtc gcgagacccg gtgcgcgcga cggcccgcag 
11341 tcggagaggg tgtgtggtaa attcgcgaag aagggcgctt ccgacgaatc cagaaacgcc 
11401 gagaagtgtg acatgtcttg tcatctacta atgcattccg atagccaccg gcgcatggaa 
11461 tccatttgtt ccccccaggg tggtgtcggg tgacaaatcc ggcctcaggt cggcctcaag 
11521 cctctttcga gcgggtgctg aggcttcccg cgtaccctcg gtggcctgcg ttcgggcggg 
11581 tgtcggggaa agggcggatc gaggagttcg gtagggcgtc gcggcgcgta ctccgggact 
11641 gatccgggtc gacgccccga cgcgtgacag ggcgtcgatc cgtgccgccc gtaccgccgg 
11701 ttttcggcga tggtcgcaga ttcctcccga cgtg^tggac tcattggttc tcccgggtgt 
11761 ggccgcaccg tcggtggcct cgtcgggggt gtcggagacc gggtcgatcg ccgtccccgg 
11621 ccgtgccgac cagggtcggt ccgtcgccga ggtgggtcac cgtcgggtgg acccggtccg 
11881 ccggcggcca ccgcccgatc gtgcccacct tcgcctccgc gggtaaatgc ttcgtcgatc 
11941 tgatcgacac ttccggcgac gctatcaccg gagcattccc cggcaccacc ggtcgatgcc 
12001 tcgcgctttc caaacaggga aaacagcagc tcacagcggt tccaggcgcc gggcaatcct 
12061 agcgaagagt "ctcgatgggg tcaaggtgaa ttctgtcaca gatgtttttg ttaaatgtac 
12121 tttcttcagc caccctcgac gttcatacaa ttggccggca tctctaccaa gggggagtga 
12181 gtggttgacg tgcccgatct actcggcacc cggactccgc acccagggcc gctcccattc 
12241 ccgtggcccc tgtgcggtca caacgaaccg gagctgcggg cccgcgcccg tcaattgcac 
12 301 gcatatctcg aaggcatttc cgaggatgac gtggtggccg tcggcgccgc cctcgcgcgc 
. 12361 gagacacgcg cgcaggacgg gccgcaccgc gccgtcgtcg tggcctcctc ggtcaccgag 
12421 ctgaccgccg cgctcgccgc cctcgcccag ggccgcccac acccctcggt ggtacgcggt 
12481 gtcgcccgac ccacggcacc ggtggtgttc gtcctgcccg gtcagggcgc ccagtggccc 
12541 ggcatggcga cccgactgct cgccgagtcg cccgtcttcg ccgcggcgat gcgggcctgc 
12 601 gagcgggcct tcgacgaggt caccgactgg tcgttgaccg aggtcctgga ctcacccgag 
12661 cacctgcgcc gcgtcgaggt ggtccagccc gcgctcttcg cggtgcagac ctcactggcc 
12721 gccctgtggc ggtcgttcgg ggtgcgaccc gacgccgtac tcggacacag catcggtgag 
12781 ctggccgccg ccgaggtctg cggcgccgtc gacgtcgagg ccgccgcgcg ggccgccgcc 
12841 ctgtggagcc gcgagatggt cccactggtg ggccggggtg acatggcggc ggtggcgctc 
12901 tccccggccg agctggcagc ccgggtcgag cggtgggacg acgacgtcgt gccggccggg 
12961 gtcaacggtc cccggtcggt gctgctcacc ggcgctcccg agcccatcgc acggcgggtc 
13021 gccgagctgg cggcacaggg cgtacgcgcc caggtcgtca acgtgtcgat ggcggcgcac 
13081 tcggcgcagg tcgacgccgt cgccgagggc atgcgctcgg cgctgacctg gttcgccccc 
13141 ggcgactccg acgtgcccta ctacgccggc ctcaccggcg ggcggctgga cacccgggaa 
13201 ctcggcgccg accactggcc gcgcagtttc cggctcccgg tgcgcttcga cgaggcgacc 
13261 cgtgcggtcc tggaactgca gcccggcacg ttcatcgagt cgagcccgca cccggtgctg 
13321. gcggcctccc tgcagcagac cctcgacgag gtcgggtccc cggccgcgat cgtgccgacc 
13381 ctgcaacgcg accagggcgg tctgcggcgg ttcctgctcg ccgtggcgca ggcgtacacc 
13441 ggtggcgtga cagtcgactg gaccgccgcc taccccgggg tgacccccgg ccacctgccg 
13501 tcggccgtcg ccgtcgagac cgacgaggga ccctcgacgg agttcgactg ggccgcgccc 
13561 gaccacgtac tgcgcgcgcg gctgctggag atcgtcggcg ccgagacggc cgcgctcgcc 
13621 gggcgggagg tcgacgcccg ggccaccttc cgggaactgg gcctcgactc ggtcctcgcg 
13681 gtgcagctgc ggacccgcct cgccacggcg accgggcggg atctgcacat cgccatgctc 
13741 tacgaccacc cgaccccgca cgccctcacc gaggcgctgc tgcgcggccc gcaggaggag 
13801 ccggggcggg gtgaggagac ggcacacccg acggaggccg aacccgacga acccgtcgcc 
-13861 gtggtcgcca tggcgtgccg gctgcccggc ggcgtcacct caccggagga gttctgggag 
13921 ctgctggccg aggggcggga cgccgtcggc gggctgccca ccgaccgggg atgggacctg 
13 981 gactcgctgt tccacccgga cccgacccgg tcgggcacgg cgcaccagcg cgctggtggc 
14041 ttcctcaccg gcgccacctc cttcgacgct gccttcttcg ggctgtcgcc acgggaggca 
14101 ctggccgtcg agccgcagca gcggatcacg ttggagctgt cgtgggaggt gctggaacgc 
14161 gccgggatcc ccccgacgtc gttgcggacc tcccggaccg gggtgttcgt cggtctgatc 
14221 ccccaggagt acggcccccg gctggccgag gggggtgagg gcgtcgaggg ctacctgatg 
14281 accgggacca ccaccagcgt cgcctccggt cgggtcgcct acaccctcgg cctggagggg 
14341 ccggcgatca gcgtcgacac cgcctgctcg tcgtcgctcg tcgccgtgca cctggcgtgc 
14401 cagtcgctgc ggcgcggcga gtcgacgatg gcgctcgccg gtggcgtgac ggtgatgccg. 
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14461 acaccgggca tgctcgtgga cttcagtcgg atgaactccc tcgcccccga cggacggtcc 

14521 aaggcgttct cggccgccgc cgacgggttc ggcatggccg aaggcgcagg gatgctcctg 

14581 ctggaacggc tctcggacgc ccgccgccac ggccacccgg tgctcgccgt gatcaggggc 

14641 accgctgtca actccgacgg cgcgagcaac ggactctccg ccccgaacgg ccgggcccag 

14701 gtccgggtga tccgacaggc cctcgccgag tccgggctga cgccccacac cgtcgacgtc 

14761 gtggagaccc acggcaccgg cacccgcctc ggtgatccga tcgaggcacg ggcgctctcc 

14821 gacgcgtacg gcggtgaccg tgagcacccg ctgcggatcg gctcggtcaa gtccaacatc 

14881 gggcacaccc aggccgccgc cggtgtcgcc ggtctgatca aactggtgtt ggcgatgcag 

14941 gccggtgtcc tgccccgcac cctgcacgcc gacgagccgt caccggagat cgactggtcc 

15001 tcgggcgcga tcagcctgct ccaggagccc gctgcctggc ccgccggcga gcggccccgc 

15061 cgggccgggg tgtcctcgtt cggcatcagc ggcaccaacg caoacgcgat catcgaggag 

15121 gcgccgccga ccggtgacga cacccgaccc gaccggatgg gcccggtggt gccctgggtg 

1S181 ctctcggcga gcaccggcga ggcgttgcgc gcccgggcgg cgcggctggc cgggcaccta 

15241 cgcgagcacc ccgaccagga cctggacgac gtcgcctact cgctggccac cggtcgggcc 

15301 gcgctggcgt accgtagtgg gttcgtgccc gccgacgcgt ccacggcgct gcggatcctc 

15361 gacgaactcg ccgccggtgg atccggggac gcggtgaccg gcaccgcccg cgccccgcag 

15421 cgcgtcgtct tcgtcttccc cggccaggga tggcagtggg cggggatggc agtcgacctg 

15481 ctcgacggcg acccggtctt cgcctcggtg ctgcgggagt gcgccgacgc gttggaaccg 

15541 tacctggact tcgagatcgt cccgttcctg cgggccgagg cgcagcgccg gacccccgac 

15601 cacacgctct ccaccgaccg cgtcgacgtg gtccagccgg tgctgttcgc ggtgatggtg 

15661 tccctggcgg cccggtggcg ggcgtacggg gtggaaccgg cggccgtcat cggacactcc 

15721 cagggggaga ttgccgcggc gtgtgtggcc ggggcgctct cgctggacga cgcggcccgg 

15781 gcggtggccc tgcgcagccg ggtcatcgcc accatgcccg gcaacggcgc gatggcctcg 

15841 atcgccgcct ccgtcgacga ggtggcggcc cggatcgacg ggcgggtcga . gatcgccgcc 

15901 gtcaacggtc cgcgcgcggt ggtggtctcc ggcgaccgtg acgacctgga ccgcctggtc 

15961 gcctcctgca ccgtcgaggg ggtgcgggcc aagcggctgc cggtggacta cgcgtcgcac 

16021 tcctcgcacg tcgaggccgt ccgtgacgcg ctccacgccg aactcggcga gttccggccg 

16081 ctgccgggct tcgtgccgtt ctactcgaca gtcaccggcc gctgggtcga gcccgccgaa 

16141 ctcgacgccg ggtactggtt tcgcaacctg cgccacaggg tccggttcgc cgacgcggtc 

16201 cgctccctcg ccgaccaggg gtacacgacg ttcctggagg tcagcgccca cccggtgctc 

16261 accacggcga tcgaggagat cggtgaggac cgtggcggtg acctcgtcgc tgtccactcg 

16321 ctgcgacgtg gggccggcgg tcccgtcgac ttcggctccg cgctggcccg cgccttcgtg 

163 81 gccggcgtcg cagtggactg ggagtcggcg taccagggtg ccggggcgcg tcgggtgccg 

16441 ctgcecacgt acccgttcca gcgtgagcgc ttctggttgg aaccgaatcc ggcccgcagg 

16501 gtcgccgact ccgacgacgt ctcgtccctg cggtaccgca tcgaatggca cccgaccgat 

16561 ccgggtgagc cgggacggct cgacggcacc tggctgctgg cgacgtaccc cggtcgggcc 

16621 gacgaccggg tcgaggcggc gcggcaggcg ctggagtccg ccggggcgcg ggtcgaggac 

16681 ctggtggtgg agccccggac gggccgggtc gacctggtgc ggcggctcga cgccgtgggt 

16741 ccggtggcgg gcgtgctctg cctgttcgct gtcgcggagc cggcggccga acactccccg 

16801 ctggcggtga cgtcgttgtc ggacacgctc gacctgaccc aggcggtggc cgggtcgggc 

16861 cgggagtgtc cgatctgggt ggtcaccgag aacgccgtcg ccgtcgggcc cttcgaacgg 

16921 ctccgcgacc cggcccacgg cgcgctctgg gccctcggtc gggtcgtcgc cctggagaac 

16981 cccgccgtct ggggcggcct ggtcgacgtg ccgtcgggtt cggtcgccga gctgtcgcgt 

17041 cacctcggga cgaccctgtc cggcgccggc gaggaccagg tcgccctccg acccgacggg 

17101 acgtacgccc gccggtggtg cagggcgggc gcgggcggca cgggccggtg gcagccccgg 

17161 ggcacggtgc tcgtcaccgg cggcaccggc ggggtcggtc ggcacgtcgc ccggtggctg 

17221 gcccgccagg gcaccccgtg cctggtgctg gccagccgcc ggggaccgga cgccgacggg 

17281 gtcgaggagc tactcaccga actcgccgac ctgggcaccc gggccaccgt caccgcctgc 

17341 gacgtcaccg accgggagca gctccgtgcc ctcctcgcga ccgtcgacga cgagcacccg 

17401 ctgtcggcgg tgttccacgt cgccgcgacg ctcgacgacg gcaccgtcga gaccctcacc 

17461 ggtgaccgca tcgaacgggc caaccgggcg aaggtgctcg gtgcccgcaa cctgcacgag 

17521 ctgacccggg acgccgacct cgacgcgttc gtgctcttct cctcctccac cgccgcgttc 

17581 ggcgcgccgg ggctcggcgg ctacgtcccg ggcaacgcct acctcgacgg tctcgcccag 

17641 cagcgacgca gcgagggact cccggccacc tcggtggcgt ggggtacctg ggcgggcagc 

17701 gggatggccg agggtccggt cgccgaccgg ttccgccggc acggggtcat ggagatgcac 

17761 cccgaccagg ccgtcgaggg tctccgggtg gcactggtgc agggtgaggt agccccgatc 

17821 gtcgtcgaca tcaggtggga ccggttcctc ctcgcgtaca ccgcgcagcg ccccacccgg 

17881 ctcttcgaca ccctcgacga ggcccgtcgg gccgcgcccg gtcccgacgc cgggccgggg 

17941 gtggcggcgc tggccgggct gcccgtcggg gaacgcgaga aggcggtcct cgacctggta 

18001 cggacgcacg cggctgccgt cctcggccac gcctcggccg agcaggtgcc cgtcgacagg 

18061 gccttcgccg aactcggcgt cgactcgctg tcggccctgg aactgcgcaa ccggctgacc 

18121 actgcgaccg gggtccggct ggccacgacg acggtcttcg accacccgga cgtacggacc 

18181 ctggccggac acctggccgc cgaactgggc ggcggatcgg ggcgggagcg gcccgggggc 

18241 gaggccccga cggtggcccc gaccgacgag ccgatcgcca tcgtcgggat ggcctgccgg . 
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18301 ctgccggggg gagtggactc accggagcag ctgtgggagt tgatcgtctc cgggcgggac 
18361 accgcctcgg cggcacccgg ggaccggagc tgggatccgg cggagttgat ggtctccgac . 
18421 acgacgggca cccgtaccgc cttcggcaac ttcatgcccg gggcgggcga gttcgacgcg 
18481 gcgttcttcg ggatctcgcc gcgtgaggcg ttggcgatgg atccgcagca gcggcacgcc 
18541 ctggagacca cctgggaggc gctggagaac gccggtatcc ggcccgagtc gttgcgcggt 
18601 acggacaccg gtgtcttcgt gggcatgtcc catcaggggt acgccaccgg ccgcccgaag 
18661 cccgaggacg aggtcgacgg ctacctgttg acaggcaaca ccgcgagcgt cgcctccggt 
18721 cggatcgcgt acgtgttggg gttggagggg ccggcgatca ctgtggacac ggcgtgttcg 
18781 tcgtcgcttg tggcgttgca cgtggcggcg ggttcgttgc gttctgggga ctgtggtctg 
18841 gcggtggcgg gtggggtgtc ggtgatggcc ggtccggagg tgttcaggga gttctcccgg 
18901 cagggcgcgt tggctccgga cggcaggtgc aagcccttct cggacgaggc cgacggcttc 
18961 ggtctggggg aggggtcggc cttcgtcgtg ttgcagcggt tgtcggtggc ggtgcgggag 
19021 gggcgtcggg tgttgggtgt ggtggtgggt tcggcggtga atcaggatgg ggcgagtaat 
19081 gggttggcgg cgccgtcggg ggtggcgcag cagcgggtga ttcggcgggc gtggggtcgt 
1914 i gcgggtgtgt cgggtgggga tgtgggtgtg gtggaggcgc atgggacggg gacgcggttg 
19201 ggggatccgg tggagttggg ggcgttgttg gggacgtatg gggtgggtcg gggtggggtg 
19261 ggtccggtgg tggtgggttc ggtgaaggcg aatgtgggtc atgtgcaggc ggcggcgggt 
19321 gtggtgggtg tgatcaaggt ggtgttgggg ttgggtcggg ggttggtggg tccgatggtg 
19381 tgtcggggtg ggttgtcggg gttggtggat tggtcgtcgg gtgggttggt ggtggcggafc 
19441 ggggtgcggg ggtggccggt gggtgtggat ggggtgcgtc ggggtggggt gtcggcgttt 
19501 ggggtgtcgg ggacgaatg.c tcatgtggtg gtggcggagg cgccggggtc ggtggtgggg 
19561 gcggaacggc cggtggaggg gtcgtcgcgg gggttggtgg gggtggttgg tggtgtggtg 
19621 ccggtggtgc tgtcggcaaa gaccgaaacc gccctgcacg c.ccaggcacg tcgactcgcc 
19681 gaccacctgg agacgcaccc cgacgtcccg atgaccgacg tggtgtggac gctgacgcag 
19741 gcccgccaac gcttcgacag gcgcgcggtc ctcctcgccg ccgaccggac ccaggccgtg 
19801 gaacggctgc gcggcctcgc cgggggcgaa ccggggaccg gtgtggtgtc gggggtggcg 
19861 tcgggtggtg gtgtggtgtt tgtttttcct ggtcagggtg gtcagtgggt ggggatggcg 
19921 cgggggttgt tgtcggttcc ggtgtttgtg gagtcggtgg tggagtgtga tgcggtggtg 
19981 tcgtcggtgg tggggttttc ggtgttgggg gtgttggagg gtcggtcggg tgcgccgtcg 
20041 ttggatcggg tggatgtggt gcagccggtg ttgttcgtgg tgatggtgtc gttggcgcgg 
20101 ttgtggcggt ggtgtggggt tgtgcctgcg gcggtggtgg gtcattcgca gggggagatc 
20161 gcggcggcgg tggtggcggg ggtgttgtcg gtgggtgatg gtgcgcgggt ggtggcgttg 
20221 cgggcgcggg cgttgcgggc gttggccggc cacggcggca tggcctcggt acgccgaggc 
20281 cgcgacgacg tacagaagct cctcgacagc ggcccctgga cggggaagct ggagatcgcc 
20341 gcggtcaacg gccccgacgc ggtggtggtc tccggcgacc cccgagccgt gaccgagctg 
20401 gtcgagcact gtgacgggat cggggtccgg gcccggacga tccccgtcga ctacgcctcc 
20461 cactccgcac aggtcgagtc gctccgggag gagctgctct ccgtcctggc cgggatcgag 
20521 ggccgcccgg cgacggtgcc gttctactcc accctcaccg gtgggttcgt cgacggcacc 
20581 gaactggacg ccgactactg gtaccgcaac ctgcgccacc cggtgcggtt ccacgccgcc 
20641 gtcgaggcgc tggcagcgcg tgacctcacc acgttcgtcg aggtcagccc gcaccccgtg 
2 0701 ctgtcgatgg cggtcgggga gacgcttgcc gacgtggagt ccgccgtcac tgtgggcacc 
20761 ctggaacgcg acaccgacga cgtcgagcgc ttcctcacct ccctcgccga ggcgcacgtc 
20821 cacggcgtac ccgtggactg ggcggcggtc ctcggctccg gaaccctggt cgacctgccc 
20881 acctatccct tccagggacg gcggttccgg ctgcaccccg accgtggtcc gcgtgacgat 
20941 gtcgccgact ggttccaccg ggtcgactgg acggcgacgg ccaccgacgg gtcggcccga 
21001 ctcgacggtc gctggctggt ggtcgtaccc gaggggtaca cggacgacgg ctgggtcgtg 
21061 gaggtgcggg ccgccctcgc cgccggtggt gccgagccgg tggtgacgac ggtcgaggag 
21121 gtcaccgacc gggtcggtga cagcgacgcg gtggtgtcga tgctcgggct ggccgacgac 
21181 ggtgcggccg agaccctggc gctgctgcga cgactcgacg cacaggcgtc caccacccca 
21241 ctgtgggtgg tcaccgtggg ggccgtcgcc cccgccggtc cggtgcagcg ccccgaacag 
21301 gcgacggtgt gggggttggc ccttgtcgcc tccctggaac gcggacaccg gtggaccggc 
21361 ctgctggatc tgccgcagac accggacccg cagctacgac cccggctggt cgaggcgctc 
21421 gccggtgccg aggaccaggt agcggtccgc gccgacgccg tacacgcccg tcggatcgtc 
214 81 cccaccccgg tcaccggagc cgggccgtac accgccccgg gcgggacgat cc tcgtcacc 
21541 gggggcaccg ccggtctggg tgccgtcacc gcccgatggc tcgccgagcg cggtgccgaa 
21601 cacctcgccc tggtcagccg gcgcgggccg ggcaccgccg gcgtcgacga ggtggtccgg 
21661 gacctgaccg ggctcggcgt acgggtgtcg gtgcactcct gcgacgtcgg cgaccgcgag 
21721 tcggtcggcg ccctggtgca ggagttgaca gcagccggtg acgtggtccg gggggtggtc 
21781 cacgctgccg gtctgcccca gcaggtgcca ctgaccgaca tggacccggc cgacctcgcc 
21841 gacgtggtgg ccgtgaaggt cgacggcgcg gtgcacctgg ccgacctgtg cccggaggcc 
21901 gaactgttcc tgctgttctc ctccggggcc ggggtgtggg gcagtgcccg tcagggtgcg 
21961 tacgccgccg gaaacgcctt cctggacgcc ttcgcccgac accggcggga ccggggtctg 
22021 cccgccacct cggtggcgtg ggggctctgg gcggccgggg ggatgacagg ggaccaggag 
22081 gcggtgtcgt tcctgcgtga gcggggcgta cggccgatgt cggtgccgag ggcactggaa. 
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22141 gcgctggaac gggtcctcac cgccggggag accgcggtgg tcgtcgccga cgtcgactgg 

22201 gcggccttcg ccgagtcgta cacctccgcc cggccccggc cgctgctcca ccggctcgtc 

22261 acacctgcgg cggcggtcgg cgagcgcgac gagccgcgtg agcagaccct ccgggaccgg 

22321 ctggcggccc tgccccgggc cgagcggtcg gcggagctgg tacgcctggt ccggcgggac 

22381 gccgcagccg tgctcggcag cgacgcgaag gccgtacccg ccaccacgcc gttcaaggac 

22441 ctcgggttcg actcgctggc cgcggtccgg ttccgtaacc ggctggccgc ccacaccggt 

22501 ctgcgtctgc cggccaccct ggtcttcgag cacccgaacg ccgcagccgt cgccgacctc 

22561 ctccacgacc gactcggcga ggccggcgag ccgacccccg tccggtcggt gggcgccgga 

22621 ctggccgcgc tggagcaggc cctgcccgac gcctccgaca cggagcgggt cgagctggtc 

22681 gagcgcctgg aacggatgct cgccgggctc cgccccgagg ccggagccgg ggccgacgcc 

22 741 ccgaccgccg gtgacgacct gggggaggcc ggcgtcgacg aactcctcga cgcgctcgaa 

22 801 cgggaactcg acgccaggtg aacccgaact gaccgcagcc gcagccgaag cagagaccga 

22861 ggacctgtga ctgacaacga caaggtggcg gagtacctcc gtcgtgcgac gctcgacctg 

22921 cgggccgccc gcaagcgcct gcgcgagctg caatccgacc cgatcgcggt cgtcggcatg 

22 981 gcctgccgcc taccgggcgg ggtgcacctc ccgcagcacc tgtgggacct cctgcgccag 
23041 gggcacgaga cggtgtccac cttccccacc gggcgcggct gggacctggc cgggctcttc 
23101 cacccggacc ccgaccaccc cggcaccagc tacgtcgacc ggggtgggtt cctcgacgac 
2 3161 gtggcgggct tcgacgccga gttcttcggg atctccccgc gcgaggccac ggccatggac 
23221 ccgcaacagc ggctgctgtt ggagaccagt tgggagctgg tggagagcgc cggcatcgat 
23281 ccgcactccc tgcgtggcac cccgaccggc gtcttcctcg gcgtggcgcg gctcggctac 

23 341 ggcgagaacg gcaccgaagc cggtgacgcc gagggctatt cggtgaccgg ggtggcaccc 
23401 gctgtcgcct ccgggcggat ctcctacgcc ctcgggctgg agggtccgtc gatcagcgtg 
23461 gacaccgcgt gctcgtcgtc gttggtggcg ctgcacctgg cggtcgagtc gctgcggctg 
23521 ggcgagtcga gtctcgctgt cgtcggcggg gcggcggtca tggcgacacc aggggtgttc 
23581 gtcgacttca gccgccagcg ggcgttggcc gctgacggca ggtcgaaggc cttcggggcc 
23641 gccgccgacg ggttcggctt ctccgagggg gtctccctcg tcctgctcga acggctctcc 
23701 gaggccgaaa gcaacggcca cgaggtgttg gctgtcatcc gtggctccgc cctcaaccag- 
23761 gacggggcca gcaacggtct cgccgcgccg aacgggaccg cccagcgcaa ggtgatccgg 
23821 caggcgctac gaaactgcgg cctgaccccg gccgacgtgg acgccgtgga ggcgcacggc 
23881 accggcacca cgctcggcga cccgatcgag gccaacgccc tgctggacac ctacggccgt 
23941 gaccgggatc cggaccaccc gctgtggctg gggtcggtga agtcgaacat cggccacacg 
24001 caggcggcgg cgggcgtcac cgggctgctc aagatggtgc tggcactgcg ccacgaggaa 
24061 ctgcccgcca ccctgcacgt cgacgagccc accccgcacg tggactggtc ctcgggagcg 
24121 gtacgcctgg cgacccgggg ccggccgtgg cggcggggtg accggccgag gcgggccggg 
24181 gtgtcggcgt tcggcatcag cgggaccaac gcccacgtga tcgtcgagga ggcacccgag 
24241 cggaccaccg agcgcaccgt cggcggcgac gtcggcccgg tcccgctcgt ggtgtccgcc 
24301 cggtcggcgg cggcgctacg ggcccaggcg gcccaggtcg ccgagctggt ggagggctcc 
24361 gacgtcgggc tggcggaggt cgggcggagc ctggccgtga cccgggcgcg acacgagcac 
24421 cgggcggcgg tggtggcgtc gacccgggcc gaggcggtgc gggggctgcg cgaggtcgcg 
24481 gcggtcgaac cgcgcggcga ggacaccgtc accggggtcg ccgagacgtc cgggcgcacc 
24541 gtcgtcttcc tcttcccggg acaggggtcc cagtgggtcg ggatgggcgc ggagctgctg 
24601 gactcggcac cggcgttcgc cgacacgatc cgcgcctgcg acgaggcgat ggcaccgttg 
24661 caggactggt cggtctccga cgtgctccgg caggagccgg gggcaccggg actggaccgg 
24721 gtcgacgtgg tgcagccggt gctgctcgcg gtgatggtgt cgttggcgcg gttgtggcag 
24781 tcgtacgggg tcacccccgc tgcggtggtg gggcactcgc agggggagat cgccgccgcc 
24841 cacgtggcgg gtgcgctctc cctcgccgac gcggcgaggc tggtggtggg ccgcagccgg 
24901 ttgctgcggt cgctgtccgg gggcggcggc atgagcgccg tcgcgctcgg tgaggccgag 
24961 gtacgccgcc gactgcggtc gtgggaggac cggatctccg tggccgccgt caacggaccc 
25021 cggtcggtgg tggtggccgg ggaaccggag gcgctgcggg agtggggacg ggagcgggag 
25081 gccgagggcg tacgggtccg cgagatcgac gtcgactacg cctcgcactc gccgcagatc 
25141 gacagggtcc gtgacgaact cctgacggtc acgggggaga tcgagccccg gtcggcggag 
25201 atcaccttct actcgacggt cgacgtccgt gctgtcgacg gcaccgacct ggacgcgggg 
25261 tactggtacc gcaacctgcg ggagacggtc cggttcgccg acgcgatgac ccggttggcc 
25321 gactcgggat acgacgcgtt cgtcgaggtc agcccgcatc cggtggtggt gtcggcggtc 
25381 gccgaggcgg tcgaggaggc aggtgtcgag gacgccgtcg tcgtcggcac cctgtcccgg 
25441 ggcgacggcg gaccgggggc gttcctgcgg tcggcggcca ccgcccactg cgccggtgtg 
25501 gacgtcgact ggacgcccgc cctcccggga gctgcgacga tcccgttgcc gacgtacccg 
25561 ttccaacgga agccgtactg gctgcggtcg tctgctcccg cccccgcctc ccacgatctc 
25621 gcctaccggg tgtcctggac gccgatcacc ccgcccgggg acggcgtact cgacggcgac 
25681 tggctggtgg tgcaccccgg gggcagcacc ggatgggtcg acgggttggc ggcggcgatc 
2 5741 accgccggcg gtggccgggt cgtcgcccac ccggtggact ccgtgacctc ccggaccggc 
25801 ctggccgagg cgctcgcccg gcgggacggc acgttccggg gggtgctgtc gtgggtggcg 
25861 accgacgaac ggcacgtcga ggccggtgcg gtcgccctgc tgaccctggc gcaggcgttg 
25921 ggtgacgccg gaatcgacgc accactgtgg Cgcctgaccc aggaggcggt ccgtaccccc - 
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25981 gtcgacggtg acctggcccg accggcgcag gccgccctgc acggtttcgc ccaggtcgcc 
26041 cggctggagc tggcccgccg cttcggtggg gtgctcgacc tgcccgccac cgtcgacgcc 
26101 gccgggacgc gtctggtcgc ggcggtcctc gccggcggcg gcgaggacgt cgtcgccgtc 
26161 cgtggcgacc gtctctacgg ccgtcgcctg gtcagggcga ccctgccgcc gcccggcggg 
26221 gggttcaccc cgcacggcac cgtcctggtc accggcgcgg ccggtccggt gggcggtcgg 
.26281 ctggcccggt ggctcgccga acggggtgcc acccgactcg tcctgcccgg cgcacacccg. 
26341 ggcgaggagt tgctgaccgc gatccgggcc gccggtgcca ccgccgtggt gtgcgaaccg 
26401 gaggcggagg cactgcgtac ggcgatcggc ggggagttgc cgaccgcgct cgtacacgcc 
26461 gagacgttga cgaacttcgc cggcgtcgcc gacgccgacc ccgaggactt cgccgccacc 
26521 gtcgcggcga agaccgcgct gccgacggtc ctggcggagg tgctcggcga ccaccgcctc 
26581 gaacgggagg tctactgctc gtcggtggcc ggggtctggg gtggggtcgg catggccgcg 
26641 tacgccgccg gcagcgccta cctcgacgcc ctggtcgagc accgtcgcgc ccgggggcac 
26701 gccagcgcct cggtggcctg gaccccgtgg gccctgcccg gcgcggtcga cgacggtcgg 
26761 ctgcgcgagc gcggcctgcg cagcctcgac gtggccgacg ccctcgggac gtgggaacgt 
26821 ctgctccgcg ccggtgcggt gtcggtggcc gtcgccgacg tcgactggtc ggtcttcaca 
26881 gagggtttcg cggccatccg gccgaccccg ctcttcgacg aactcctcga ccggcgcggg 
2 6941 gaccccgacg gcgcgcccgt cgaccggccg ggggagccgg cgggcgagtg gggtcgacga 
27001 atcgcggcgc tgtccccgca ggaacagcgg gagacgttgc tgaccctcgt cggcgagacg 
27061 gtcgcggagg tgctgggaca cgagaccggc accgagatca acacccgtcg ggccttcagc 
27121 gaactcggcc tcgactcgct gggctcgatg gccctgcgtc agcgcctggc ggcccgtacc 
27181 ggcctgcgga tgccggcctc gctggtcttc gaccacccga cggtcaccgc gctcgcgcgg 
27241 tacctgcgtc gactggtcgt cggggactcc gacccgaccc cggtacgggt gttcggcccc 
27301 accgacgagg ccgaacccgt cgccgtggtc ggcatcggct gccggttccc cggcggcatc 
27361 gccacccccg aggacctctg gcgggtggtg tccgagggca Cctccatcac caccggattc 
27421 cccaccgacc ggggctggga cctccggcgg ctctaccacc ccgacccgga ccaccccggc 
27481 accagctacg tcgacagggg gggattcctc gacggggccc cggacttcga ccccgggttc 
27541 ttcgggatca ccccccgcga ggcgctggcg atggacccgc agcagcggct caccctggag 
27601 atcgcgtggg aggcggtgga acgggcgggc atcgacccgg agaccctcct cggcagcgac 
27661 accggcgtct tcgtcggcat gaacggccag tcctacctgc aactgctgac cggggagggt 
27721 gaccggctca acggctacca ggggttgggc aactcggcga gcgtgctctc cggccgtgtc 
27781 gcctacacct tcgggtggga ggggccggcg ctgacggtgg acaccgcctg ctcgtcctcg 
27841 ctggtcgcca tccacctcgc catgcagtcg ctgcgtcggg gtgagtgctc gctggcgttg 
27901 gccggcgggg tgacggtcat ggccgacccg tacaccttcg tggacttcag cgcacagcgg 
27 961 gggotcgccg ccgacgggcg gtgcaaggcg ttctccgcgc aggccgacgg gttcgccctc 
28021 gccgagggcg tcgcggcgct cgtcctcgaa ccgttgtcca aggcgcggcg aaacggccac 
28081 caggtgctgg cggtgctgcg cggcagcgcc gtcaaccagg acggggccag caacggcctc 
28141 gccgccccga acgggccgtc gcaggaacgg gtgatcaggc aggccctgac cgcctccggg 
28201 ctgcgtcccg ccgacgtcga catggtggag gcgcacggga cgggcaccga actcggcgac 
28261 ccgatcgagg ccggggcgct catcgcggcg tacggccggg accgggaccg gccgctctgg 
28321 ctgggctcgg tgaagacgaa catcggccac acccaggccg ccgccggtgc cgccggggtg 
28381 atcaaggcgg tcctggcgat gcggcacggc gtactcccga ggtcgctgca cgccgacgag 
28441 ttgtccccgc acatcgactg ggcggacggg aaggtcgagg tgctccgcga ggcacgacag 
28501 tggccccccg gtgagcgccc ccgccgcgcc ggggtgtcct ccttcggcgt cagcgggacc 
28561 aacgcccacg tcatcgtcga ggaggcaccc gccgaaccgg accccgaacc ggttcccgcc 
28621 gccccgggcg ggcccctgcc cttcgtcctg cacggacgca gcgtccagac ggtccggtcc 
28681 caggcgcgga ccctcgccga acacctgcgc accaccggcc accgggacct cgccgacacc 
28741 gcccgtaccc tggccaccgg tcgcgcccgt ttcgacgtcc gggccgcagt gctcggcacc 
28801 gaccgggagg gtgtctgcgc cgccctcgac gcgctggcgc aggatcgccc ctcgcccgac 
28861 gtcgtcgccc cggcggtctt cgccgcccgt acccccgtcc tggtcttccc cgggcagggg 
28921 tcgcagtggg tcggcatggc ccgtgacctg ctcgactcct ccgaggtgtt cgccgagtcg 
28981 atgggccggt gcgccgaggc gctgtcgccg tacaccgact gggacctgct cgacgtggtc 
29041 cgtggggtcg gcgaccccga cccgtacgac cgggtggacg tgctccagcc ggtgctgttc 
29101 gcggtgatgg tgtcgctggc gcggttgtgg cagtcgtacg gggtgactcc gggtgcggtg 
29161 gtgggtcact cgcaggggga gatcgccgcc gcgcacgtgg ctggtgcgtt gtcgttggcc 
29221 gacgccgcca gggtggtggc gttgcgcagc cgggtgctgc gggagctcga cgaccagggc 
29281 ggcatggtgt cggtcggcac ctcccgcgcc gagttggact cggtcctgcg ccggtgggac 
29341 gggcgggtcg cggtggcggc ggtgaacgga cccggcacgc tcgtggtggc cggacccacc 
29401 gccgaactgg acgagttcct cgcggtggcc gaggcccgcg agatgaggcc gcgtcggatc 
29461 gcggtgcgct acgcgtcgca ctccccggag gtggcccggg tcgaacagcg gctcgccgcc 
29521 gaactcggca ccgtcaccgc cgtcggcggc acggtcccgc tctactccac cgccaccggg 
29581 gacctcctcg acaccacagc catggacgcc gggtactggt accgcaacct gcgccaaccg 
29641 gtgctgttcg agcacgccgt ccgcagcctc ctggagcggg gattcgagac gttcatcgag 
29701 gtcagcccgc accctgtgct gctgatggcg gtcgaggaga ccgccgagga cgccgagcgc 
29761 ccggtcaccg gcgtgccgac gctgcgccgc gaccacgacg ggccgtcgga gttcctccgc 
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29821 aacctcctgg gggcgcacgt gcacggggtc 
29881 ggccgcctgg tcgacctgcc cacctacccc 
29941 caccgcaggg ccgacacctc gtcgctgggg 
30001 gccgcagtcg acgtacccgg tcacggcgga 
30061 gagcagcagt ggctgaccca gcacgtggtg. 
30121 ctggtcgacc tcgcgctcac cgccggggcc 
30181 gtcctgcagc agccgctggt gttgaccgcc 
30241 gccgccgacg aggacgggcg gcggccggtc 
30301 ccggccgagg cccggtggtc ggcgtacgcg 
30361 ggcggccggg acggcacaca gtggcccccg 
30421 cactacgaca ccctcgccga actgggctac 
30481 gccgcgtggc agcacggcga cgtggtctac 
30541 gggtacgcgt tcgacccggt gctgctcgac 
30601 cgcgcccccg ggaagctccc cttcgcctgg 
30661 actgcggtac gggtggtggc gacccccgcc 
30721 gacccgaccg gtcagctcgt cgccacggtg 
30781 gatcgggacc agccgcgcgg ccgcgacggc 
.30841 gccaccccgg acccgacccc ggcggcggtg 
30901 ctgctgcgcg ccggtggtcc ggcaccacag 
30961 gacgacccga cggccgaggc ccgtcacggg 
31021 tggctcgacg acgaccggtg gcccgccacc 
31081 gaggtctccc ccggggacga cgtgccgcgc 
31141 cgctgcgccc aggcggagtc cccggaccgc 
31201 cccccggcgg tgccggacaa tccgcagctc 
31261 cggctgacgc cgctcgccgg tcccgtgccg 
31321 cccggcaacg gcggctccat cgagjgcagtg 
313 81 cccctggcgc cggaggaggt acgcgtcgcc 
31441 gtcctgctcg cgctcggcat gtacccggaa 
31501 gtggtcaccg aggtcgggtc gggtgtccgg 
31561 ctgttccagg gggccttcgg gccggtggcg 
31621 cccgacgggt ggcgggcggt ggacgccgca 
31681 tacgcgctgc acgacctggc cgggttgcag 
31741 gccggcgggg tggggatggc tgccgtcgcg 
31801 gccacggcca gcccggccaa acacccgacg 
31861 atcgcctcgt cccgggagag cgggttcggt 
31921 ggcgtcgacg tggtcctgaa ctcgctcacc 
31981 ctcgccgacg gcggggtctt cgtcgagatg 
32041 ttccggggcc ggtacgtccc gttcgacctg 
32101 atcctggagg aggtcgtcgg tctgctggcc 
32161 gtgtgggagt tgtcggcggc cccggccgcg 
32221 ggcaagctcg tcctcaccca gcccgccccc 
32281 ggcgggaccg gcaccctggg gcggctggtc 
32341 ccccacctcc tggtggccag ccggcgcggt 
32401 gccgacgtcg aaggcctcgg cgcgaccatc 
32461 gaggcgctcg cggcgctgct cgactcgatc 
32521 cacaccgccg gggtcctggc cgacgggctg 
32581 caggtcctgc gggccaaggt cgacgcggcg 
32641 gacctgagct tcctcgtgct gttctcgtcg 
32701 ggcgtgtacg cggcggccaa cggggtcctc 
32761 ggactgcccg cgaaggcgct cgggtggggc 
32821 ggcctcggtg accggatcgc ccgtaccggg 
32881 gccctgttcg acgcggctct gcgcagcggc 
32941 aggtcggcgc tgcgccgggc cgagtacgtc 
33001 acgccacggg ccgccaacag ggccgagacc 
33061 ggtgcacccg agaccgatca ggtggccgcg 
3 3121 gcggtcgccg gctacgactc ggccgaccag 
33181 gggttcgact cgctggcggc ggtggagctg 
33241 cggctgccca gcacgctggt gttcgaccac 
33301 cggtcggagt tgttcgccga ctccgcgccg 
33361 ctggaacggg cgctcgacgc cctgcccgac 
33421 ctggaggcgc tgctgcgccg gtggcagagc 
33481 atcagtgacg acgccagtga cgacgagctg 
33541 ggaggggacg tctaggtgac aggtcgattc 
33601 acaggtccac cgggttcgcg tcgcctccca 



gacgtcgacc tgcgtccggc ggtcgcccac . 
ttcgacaggc agcggctctg gcccaagccg 
gtccgtgact cgacccaccc gctgctgcac 
gcggtgttca ccgggcggct ctcccccgac 
9gtgggcgga acctggtgcc cggcagtgtc 
gacgtcggcg tgccggtgct ggaggaactc 
gccggtgcgt tgctgcgcct gtcggtcggc 
gagatccacg ccgccgagga cgtctccgac- 
accgggaccc tcgccgtcgg cgtggccggc 
cccggcgcca ccgccctgac gttgaccgac 
gagtacgggc cggcgttcca ggcgctgcgc 
9CS9aggtgt ccctcgacgc cgtcgaggag 
gccgtcgccc agaccttcgg cctgaccagt 
c 9999cgtca ccctgcacgc caccggggcc 
ggaccggacg cggtggccct gcgggtcacc 
gacgccctgg tcgtcaggga cgccggggcg 
gacctgcacc gcctggagtg ggtacggctg 
gtgcacgtgg cggccgacgg gctcgacgac 
gccgtcgtcg tccgctaccg tcccgacggc 
gtgctctggg cggccacgct cgtgcgccgt 
accctggtgg tggccacgtc cgcaggggtc 
cccggggccg ccgccgtgtg gggggtgctg 
ctcgtgctcg tcgacggcga cccggagacg 
gcggtccgtg acggtgcggt gttcgtgcca 
gccgtcgccg accgggcgta ccggctggtg 
gccttcgccc ccgtccccga cgccgaccgg 
gtccgcgcca ccggcgtgaa cttccgtgac 
ccggccgaga tgggcaccga ggcgtccggt 
cggttcaccc ccggccaggc ggtgacgggc 
gtcgccgacc accggctcct caccccggtc 
gccgtaccca tcgcgttcac caccgcccac 
gccgggcagt ccgtgctggt ccacgccgcc 
ttggcccgtc gggccggggc ggaggtgttc 
ctgcgggcgc tcggcctcga cgacgaccac 
gagcggttcg ccgcgcgtac cggggggcgg 
ggcgacctgc tcgacgagtc cgcgcggctg 
ggcaagaccg acctgcggcc ggcggagcag 
gccgaggccg gtcccgatcg gctcggcgag 
gccggtgccc tcgaccggtt gccggtgtcg 
ctcacccaca tgagccgggg ccgacacgtg 
gtgcaccccg acggaacggt gctggtcacc 
gcccgccacc tggtgaccgg gcacggcgta 
ccggcggccc cgggcgcggc cgagctgcgc 
gagatcgtcg cctgcgacac cgccgaccgg 
cccgcggacc gtccgctgac cggggtggtg 
gtcacctcca tcgacgggac cgccaccgat 
tggcacctgc acgacctgac ccgggacgcg 
gcggcgtcgg tgctggccgg tcccgggcag 
aacgccctgg ccgggcaacg gcgggccctc 
ctgtgggcgc aggccagcga gatgaccagc 
gtcgccgcgc tgccgaccga gcgggcgctg 
9999aggtgc tgttcccgct gtctgtcgac 
cccgaggtgc tgcgcggcgc ggtccggtcc 
ccgggccggg gcctgctcga ccgtctcgtc 
ctggccgagc tggtccgctc gcacgcggcg 
ctgcccgaac gcaaggcgtt caaggacctc 
cgcaaccggc tcggcgtcac caccggcgta 
ccgacaccgc tggcggtggc cgaacacctg 
gacgtcgggg tcggtgcgcg cctcgacgac 
gcgcagggac acgccgacgt cggggcccgc 
cgacgacccc cggagaccga gccagtgacg 
ttctcgatgc tcgacaggcg tctcggcggg 
cgccccgcgg cagtggaccg taccgccctg 
cacccgacgg ccggggtatc cacggaaggg - 
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33661 atccgatgag cgagagcagc ggcatgaccg aggaccgcct ccggcgctat ctcaagcgca 
33721 ccgtcgccga actcgactcg gtgacaggtc ggctcgacga ggtcgagtac cgggcccgcg 
33781 aaccgatcgc cgtcgtcggc atggcctgcc ggttccccgg gggtgtggac tcgccggagg 
33841 cgttctggga gttcatccgc gacggtggtg acgcgatcgc cgaggcgccc acggaccgtg 
33901 gctggccgcc ggcaccgcga ccccgcctcg gtggtctcct cgcggagccg ggcgcgttcg 
3 3961 acgccgcctt cttcggcatc tcaccccgcg aggcgctcgc gacggacccc cagcagcgcc 
34021 tgatgctgga gatctcctgg gaggcgttgg agcgtgcggg tttcgacccg tcgagcctgc 
34081 gcggcagcgc cggtggcgtc ttcaccggtg tcggtgcggt ggactacgga cccaggccgg 
34141 acgaggcacc cgaggaggtg ctcggctacg tcggcatcgg caccgcctcc agcgtcgcct 
34201 ccggacgggt ggcgtacacc ctggggttgg agggtccagc cgtcaccgtc gacaccgcct 
34261 gctcctccgg gctcaccgcg gtgcacctgg cgatggagtc gctgcgccgc gacgagtgca 
34321 ccctggtcct cgccggtggg gtcaccgtga tgagcagccc gggtgcgttc accgagttcc 
34381 gcagccaggg cgggttggcc gaggacggcc gctgcaaacc gttctcccgc gccgccgacg 
34441 gcttcgggct cgccgagggg gccggggtcc tggtgcCcca acggctgtcc gtcgcccggg 
34501 ccgagggccg gccggtgctg gccgtactgc gtggctcggc gatcaaccag gacggtgcca 
34561 gcaacgggct caccgcgccg agcggccccg cccagcggcg ggtgatcagg caggcgttgg 
34621 agcgggcgcg gctgcgtccc gtcgacgtgg actacgtgga ggcccacggc accggcaccc 
34681 ggctgggcga tccgatcgag gcgcacgccc tgctcgacac gtacggtgcc gaccgggaac 
34741 ccggccgccc gctctgggtc ggatcggtga agtccaacat cggtcacacc caggcggcgg 
34801 cgggggtggc cggggtgatg aagaccgtgc tggcgctgcg gcatcgggag atcccggcga 
34861 cgttgcactt cgacgagccc tcgccgcacg tcgactggga ccggggtgcg gtgtcggtgg 
34921 tgtccgagac ccggccctgg ccggtggggg agcgcccgcg ccgggcgggg gtgtcctcgt 
34981 tcggcatcag cggcaccaac gcgcacgtca tcgtcgagga ggcgccgagc ccgcaggcgg 
35041 ccgacctcga cccgaccccc ggcccggcaa ccggagcgac cxccggaacg gatgccgccc 
35101 ccaccgccga gccgggtgcg -gaggcggtcg cactggtgtt ctccgcgcgc gacgagcggg 
35161 ccctgcgcgc ccaggcggcc cggctcgccg accgtctcac cgacgacccg gccccctcgt 
35221 tgcgcgacac cgccttcaoc ctggtcaccc gccgtgccac ctgggagcat cgggcggtcg 
35281 tcgtcggcgg gggcgaggag gtcctcgccg gcctccgggc cgtcgccggg ggacgtcccg 
35341 tcgacggagc cgtcagcggg cgggcgcgcg ccggccgccg ggtggtgctg gtcttccccg 
35401 ggcagggcgc acagtggcag ggcatggccc gggacctgct gcggcagtcg ccgaccttcg 
35461 cggagtccat cgacgcctgc gagcgggcgc tcgccccgca cgtggactgg tcgctgcgcg 
35521 aggtgctcga cggcgagcag tcgttggacc ccgtcgacgt ggtgcagccg gtgctgttcg 
35581 cggtgatggt gtcgttggcg cggttgtggc agtcgtacgg ggtgactccg ggtgcggtgg 
35641 tgggtcactc gcagggggag atcgccgccg cgcacgtggc tggtgcgttg tcgttggccg 
35701 acgccgccag ggtggtggcg ttgcgcagcc gggtgctgcg ccgtctcggt ggtcacggcg 
35761 ggatggcgtc gttcgggctc caccccgacc aggccgccga gcggatcgcg cgcttcgcgg 
35821 gtgcgctgac tgtcgcctcg gtcaacggtc cccgttcggt ggtgctggcc ggggagaacg 
35881 gcccgttgga cgagctgatc gccgagtgcg aggccgaggg cgtgaccgcc cgtcggatcc 
35941 ccgtcgacta cgcctcacac tccccgcagg tggagtcgct gcgtgaggag ctgctcgccg 
36001 cactggccgg ggtccgtccg gtgtcggccg ggatccccct gtactcgacc ctgaccggtc 
36061 aggtcatcaa -aacggcgacg atggacgccg actactggtt cgccaacctc cgggagccgg 
36121 tgcgcttcca ggacgccacc aggcagctcg ccgaggcggg gttcgacgcc ttcgtcgagg 
36181 tcagcccaca ccccgtgttg acagtcggtg tcgaggccac cctcgaggca gtgctgcccc 
36241 ccgacgccga tccgtgtgtc acaggcaccc tgcgccgcga acgcggcggt ctcgcgcagt 
36301 tccacaccgc gctcgccgag gcgtacaccc ggggggtgga ggtcgactgg cgtaccgcag 
36361 tgggtgaggg acgcccggtc gacctgccgg tctacccgtt ccaacgacag aacttctggc 
36421 tcccggtccc cctgggccgg gtccccgaca ccggcgacga gtggcgttac cagctcgcct 
364 81 ggcaccccgt cgacctcggg cggtcctccc tggccggacg ggtcctggtg gtgaccggag 
36541 cggcagtacc cccggcctgg acggacgtgg tccgcgacgg cctggaacag cgcggggcga 
36601 ccgtcgtgtt atgcaccgcg cagtcgcgcg cccggatcgg cgccgcactc gacgccgtcg 
36661 acggcaccac cctgtccact gtggtctctc tgctcgcgct cgccgagggc ggtgctgtcg 
36721 acgaccccag cctggacacc ctcgcgttgg tccaggcgct cggcgcagcc gggatcgacg 
36781 tccccctgtg gctggtgacc agggacgccg ccgccgtgac cgtcggagac gacgtcgatc 
36841 cggcccaggc catggtcggt gggctcggcc gggtggtggg cgtggagtcc cccgcccggt 
36901 ggggtggcct ggtggacctg cgcgaggccg acgccgactc ggcccggtcg ctggccgcca 
36961 tactggccga cccgcgcggc gaggagcagt tcgcgatccg gcccgacggc gtcaccgtcg 
37021 cccgtctcgt cccggcaccg gcccgcgcgg cgggtacccg gtggacgccg cgcgggaccg 
37081 tcctggtcac cggcggcacc ggcggcatcg gcgcgcacct ggcccgctgg ctcgccggtg 
37141 cgggcgccga gcacctggtg ctgctcaaca ggcggggagc ggaggcggcc ggtgccgccg 
37201 acctgcgtga cgaactggtc gcgctcggca cgggagtcac catcacggcc tgcgacgtcg 
37261 ccgaccgcga ccggttggcg gccgtcctcg acgccgcacg ggcgcaggga cgggtggtca 
37321 cggcggtgtt ccacgccgcc gggatctccc ggtccacagc ggtacaggag ctgaccgaga 
373 81 gcgagttcac cgagatcacc gacgcgaagg tgcggggtac ggcgaacctg gccgaactct 
37441 gtcccgaoct ggacgccctc gtgctgttct cctcgaacgc ggcggtgtgg ggcagcccgg 
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37501 ggctggcctc ctacgcggcg ggcaacgcct tcctcgacgc cttcgcccgt cgtggtcggc 
37561 gcagtgggct gccggtcacc tcgatcgcct ggggtctgtg ggccgggcag aacatggccg 
3 7621 gtaccgaggg cggcgactac ctgcgcagcc agggcctgcg cgccatggac ccgcagcggg 
37681 cgatcgagga gctgcggacc accctggacg ccggggaccc gtgggtgtcg gtggtggacc 
37741 tggaccggga gcggttcgtc gaactgttca ccgccgcccg ccgccggccc ctcttcgacg 
37801 aactcggtgg ggtccgcgcc ggggccgagg agaccggtca ggaatcggat ctcgcccggc 
37861 ggctggcgtc gatgccggag gccgaacgtc acgagcatgt cgcccggctg gtccgagccg 
37921 aggtggcagc ggtgctgggc cacggcacgc cgacggtgat cgagcgtgac gtcgccttcc 
37981 gtgacctggg attcgactcc atgaccgccg tcgacctgcg gaaccggctc gcggcggtga 
38041 ccggggtccg ggtggccacg accatcgtct tcgaccaccc gacagtggac cgcctcaccg 
3 8101 cgcactacct ggaacgactc gtcggtgagc cggaggcgac gaccccggct gcggcggtcg 
3 8161 tcccgcaggc acccggggag gccgacgagc cgatcgcgat cgtcgggatg gcctgccgcc 
38221 tcgccggtgg agtgcgtacc cccgaccagt tgtgggactt catcgtcgcc gacggcgacg 
38281 cggtcaccga gatgccgtcg gaccggtcct gggacctcga cgcgctgttc gacccggacc 
3 8341 ccgagcggca cggcaccagc tactcccggc acggcgcgtt cctggacggg gcggccgact 
3 8401 tcgacgcggc gttcttcggg atctcgccgc gtgaggcgtt ggcgatggat ccgcagcagc 
3 8461 ggcaggtcct ggagacgacg tgggagctgt tcgagaacgc cggcatcgac ccgcactccc 
38521 tgcgcggtac ggacaccggt gtcttcctcg gcgctgcgta ccaggggtac ggccagaacg 
38581 cgcaggtgcc gaaggagagt gagggttacc tgctcaccgg tggttcctcg gcggtcgcct 
3 8641 ccggtcggat cgcgtacgtg ttggggttgg aggggccggc gatcactgtg gacacggcgt 
38701 gttcgtcgtc gcttgtggcg ttgcacgtgg cggccgggtc gctgcgatcg ggtgactgtg 
3 8761 ggctcgcggt ggcgggtggg gtgtcggtga tggccggtcc ggaggtgttc accgagttct 
38821 ccaggcaggg cgcgctggcc cccgacggtc ggtgcaagcc cttctccgac caggccgacg 
38881 ggttcggatt cgccgagggc gtcgctgtgg tgctcctgca gcggttgtcg gtggcggtgc 
38941 gggaggggcg tcgggtgttg ggtgtggtgg tgggttcggc ggtgaatcag gatggggcga 
39001 gtaatgggtt ggcggcgccg tcgggggtgg cgcagcagcg ggtgattcgg cgggcgtggg 
3 9061 gtcgtgcggg tgtgtcgggt ggggatgtgg gtgtggtgga ggcgcatggg acggggacgc 
3 9121 ggttggggga tccggtggag ttgggggcgt tgttggggac gtatggggtg ggtcggggtg 
3 9181 gggtgggtcc ggtggtggtg ggttcggtga aggcgaatgt gggtcatgtg caggcggcgg 
3 9241 cgggtgtggt gggtgtgatc aaggtggtgt tggggttggg tcgggggttg gtgggtccga 
39301 tggtgtgtcg gggtgggttg tcggggttgg tggattggtc gtcgggtggg ttggtggtgg 
39361 cggatggggt gcgggggtgg ccggtgggtg tggatggggt gcgtcggggt ggggtgtcgg 
39421 cgtttggggt gtcggggacg aatgctcatg tggtggtggc ggaggcgccg gggtcggtgg 
39481 tgggggcgga acggccggtg gaggggtcgt cgcgggggtt ggtgggggtg gctggtggtg 
39541 tggtgccggt ggtgctgtcg gcaaagaccg aaaccgccct gaccgagctc gcccgacgac 
39601 tgcacgacgc cgtcgacgac accgtcgccc tcccggcggt ggccgccacc ctcgccaccg 
39661 gacgcgccca cctgccctac cgggccgccc tgctggcccg cgaccacgac gaactgcgcg 
39721 acaggctgcg ggcgttcacc actggttcgg cggctcccgg tgtggtgtcg ggggtggcgt 
39781 cgggtggtgg tgtggtgttt gtttttcctg gtcagggtgg tcagtgggtg gggatggcgc 
39841 gggggttgtt gtcggttccg gtgtttgtgg agtcggtggt ggagtgtgat gcggtggtgt 
39901 cgtcggtggt ggggttttcg gtgttggggg tgttggaggg tcggtcgggt gcgccgtcgt 
39961 tggatcgggt ggatgtggtg cagccggtgt tgttcgtggt gatggtgtcg ttggcgcggt 
40021 tgtggcggtg gtgtggggtt gtgcctgcgg cggtggtggg tcattcgcag ggggagatcg 
40081 cggcggcggt ggtggcgggg gtgttgtcgg tgggtgatgg tgcgcgggtg gtggcgttgc 
40141 gggcgcgggc gttgcgggcg ttggccggcc acggcggcat ggtctccctc gcggtctccg 
40201 ccgaacgcgc ccgggagctg atcgcaccct ggtccgaccg gatctcggtg gcggcggtca 
40261 actccccgac ctcggtggtg gtctcgggtg acccacaggc cctcgccgcc ctcgtcgccc 
40321 actgcgccga gaccggtgag cgggccaaga cgctgcctgt ggactacgcc tcccactccg 
40381 cccacgtcga acagatccgc gacacgatcc tcaccgacct ggccgacgtc acggcgcgcc 
40441 gacccgacgt cgccctctac tccacgctgc acggcgcccg gggcgccggc acggacatgg 
40501 acgcccggta ctggtacgac aacctgcgct caccggtgcg cttcgacgag gccgtcgagg 
40561 ccgccgtcgc cgacggctac cgggtcttcg tcgagatgag cccacacccg gtcctcaccg 
40621 ccgcggtgca ggagatcgac gacgagacgg tggccatcgg ctcgctgcac cgggacaccg 
40681 gcgagcggca cctggtcgcc gaactcgccc gggcccacgt gcacggcgta ccagtggact 
40741 ggcgggcgat cctccccgcc acccacccgg ttcccctgcc gaactacccg ttcgaggcga 
40801 cccggtactg gctcgccccg acggcggccg accaggtcgc cgaccaccgc taccgcgtcg 
40861 actggcggcc cctggccacc accccggcgg agctgtccgg cagctacctc gtcttcggcg 
40921 acgccccgga gaccctcggc cacagcgtcg agaaggccgg cgggctcctc gtcccggtgg 
40981 ccgctcccga ccgggagtcc ctcgcggtcg ccctggacga ggcggccgga cgactcgccg 
41041 gtgtgctctc cttcgccgcc gacaccgcca cccacctggc ccggcaccga ctcctcggcg 
41101 aggccgacgt cgaggcccca ctctggctgg tcaccagcgg cggcgtcgca ctcgacgacc 
41161 acgac'ccgat cgactgcgac caggcaatgg tgtgggggat cggacgggtg atgggtctgg 
41221 agaccccgca ccggtggggc ggcctggtgg acgtgaccgt cgaacccacc gccgaggacg 
41281 gggtggtctt cgccgccctc ctggccgccg acgaccacga ggaccaggtg gcgctgcgcg 
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41341 acggcatccg ccacggccga cggctcgtcc gcgccccgct gaccacccga aacgccaggt 
41401 ggacaccggc gggcacggcg ctcgtcacgg gcggtacggg tgccctcggc ggccacgtcg 
41461 cgcggtacct ggcccggtcc ggggtgaccg atctcgtcct gctcagcagg agcggccccg 
41521 acgcacccgg tgccgccgaa ctggccgccg aactggccga cctcggggcc gagccgagag 
41581 tcgaggcgtg cgacgtcacc gacgggccac gcctgcgcgc cctggtgcag gagctacggg 
41641 aacaggaccg gccggtccgg atcgtcgtcc acaccgcagg ggtgcccgac tcccgtcccc 
41701 tcgaccggat cgacgaactg gagtcggtca gcgccgcgaa ggtgaccggg gcgcggctgc 
41761 tcgacgagct ctgcccggac gccgacacct tcgtcctgtc ctcctcgggg gcgggagtgt 
41821 ggggtagcgc gaacctgggc gcgtacgcgg cagccaacgc ctacctggac gccctggccc 
41881 accgccgccg ccaggcgggc cgggccgcga cctcggtcgc ctggggggcg tgggccggcg 
41941 acggcatggc caccggcgac ctcgacgggc tgacccggcg cggtctgcgg gcgatggcac 
42001 cggaccgggc gctgcgcgcc tgcaccaggc gttggaccac ccacgacacc tgtgtgtcgg 
42061 tagccgacgt cgactgggac cgcttcgccg tgggtttcac cgccgcccgg cccagacccc 
42121 tgatcgacga actcgtcacc tccgcgccgg tggccgcccc caccgctgcg gcggccccgg 
42181 tcccggcgat gaccgccgac cagctactcc agttcacgcg ctcgcacgtg gccgcgatcc 
42241 tcggtcacca ggacccggac gcggtcgggt tggaccagcc cttcaccgag ctgggcttcg 
42301 actcgctcac cgccgtcggc ctgcgcaacc agctccagca ggccaccggg cggacgctgc 
42361 ccgccgccct ggtgttccag caccccacgg tacgcagact cgccgaccac ctcgcgcagc 
42421 agctcgacgt cggcaccgcc ccggtcgagg cgacgggcag cgtcctgcgg gacggctacc 
42481 ggcgggccgg gcagaccggc gacgtccggt cgtacctgga cctgctggcg aacctgtcgg 
42541 agttccggga gcggttcacc gacgcggcga gcctgggcgg acagctggaa ctcgtcgacc 
42601 tggccgacgg atccggcccg gtcactgtga tctgttgcgc gggcactgcg gcgctctccg 
42661 ggccgcacga gttcgcccga ctcgcctcgg cgctgcgcgg caccgtgccg gtgcgcgccc 
42721 tcgcgcaacc cgggtacgag gcgggtgaac cggtgccggc gtcgatggag gcagtgctcg 
42781 gggtgcaggc ggacgcggtc ctcgcggcac -agggcgacac gccgttcgtg ctggtcggac 
42841 actcggcggg ggccctgatg gcgtacgccc tggcgaccga gctggccgac cggggccacc 
42901 cgccacgtgg cgtcgtgctc ctcgacgtgt acccacccgg tcaccaggag gcggtgcacg 
42961 cctggctcgg cgagctgacc gccgccctgt tcgaccacga gaccgtacgg atggacgaca 
43021 cccggctcac ggccctgggg gcgtacgaca ggctgaccgg caggtggcgt ccgagggaca 
43081 ccggtctgcc cacgctggtg gtggccgcca gcgagccgat gggggagtgg ccggacgacg 
43141 gttggcagtc cacgtggccg ttcgggcacg acagggtcac ggtgcccggt gaccacttct 
43201 cgatggtgca ggagcacgcc gacgcgatcg cgcggcacat cgacgcctgg ttgagcgggg 
43261 agagggcatg aacacgaccg atcgcgccgt gctgggccga cgactccaga tgatccgggg 
43321 actgtactgg ggttacggca gcaacggaga cccgtacccg atgctgttgt gcgggcacga 
43381 cgacgacccg caccgctggt accgggggct gggcggatcc ggggtccggc gcagccgtac 
43441 cgagacgtgg gtggtgaccg accacgccac cgccgtgcgg gtgctcgacg acccgacctt 
43501 cacccgggcc accggccgga cgccggagtg gatgcgggcc gcgggcgccc cggcctcgac 
43561 ctgggcgcag ccgttccgtg acgtgcacgc cgcgtcctgg gacgccgaac tgcccgaccc 
4 3621 gcaggaggtg gaggaccggc tgacgggtct cctgcctgcc ccggggaccc gcctggacct 
43681 ggtccgcgac ctcgcctggc cgatggcgtc gcggggggtc ggcgcggacg accccgacgt 
43741 gctgcgcgcc gcgtgggacg cccgggtcgg cctcgacgcc cagctcaccc cgcagcccct 
43801 ggcggtgacc gaggcggcga ccgccgcggt gcccggggac ccgcaccggc gggcgctgtt 
43861 caccgccgtc gagatgacag ccaccgcgtt cgtcgacgcg gtgctggcgg tgaccgccac 
43921 ggcgggggcg gcccagcgtc tcgccgacga ccccgacgtc gccgcccgtc tcgtcgcgga 
43981 ggcgctgcgc ctgcatccga cggcgcacct ggaacggcgt accgccggca ccgagacggt 
44041 ggtgggcgag cacacggtcg cggcgggcga cgaggtcgtc gtggtggtcg ccgccgccaa 
44101 ccgtgacgcg ggggtcttcg ccgacccgga ccgcctcgac ccggaccggg ccgacgccga 
44161 ccgggccctg tccgcccagc gcggtcaccc cggccggttg gaggagctgg tggtggtcct 
44221 gaccaccgcc gcactgcgca gcgtcgccaa ggcgctgccc ggtctcaccg ccggtggccc 
44281 ggtcgtcagg cgacgtcgtt caccggtcct gcgagccacc gcccactgcc cggtcgaact 
44341 ctgaggtgcc tgcgatgcgc gtcgtcttct cctccatggc cagcaagagc cacctgttcg 
44401 gtctcgttcc cctcgcctgg gccttccgcg cggcgggcca cgaggtacgg gtcgtcgcct 
44461 caccggctct caccgacgac atcacggcgg ccggactgac ggccgtaccg gtcggcaccg 
44521 acgtcgacct tgtcgacttc atgacccacg ccgggtacga catcatcgac tacgtccgca 
44581 gcctggactt cagcgagcgg gacccggcca cctccacctg ggaccacctg ctcggcatgc 
44641 agaccgtcct caccccgacc ttctacgccc tgatgagccc ggactcgctg gtcgagggca 
44701 tgatctcctt ctgtcggtcg tggcgacccg actggtcgtc tggaccgcag accttcgccg 
44761 cgtcgatcgc ggcgacggtg accggcgtgg cccacgcccg actcctgtgg ggacccgaca 
44821 tcacggtacg ggcccggcag aagttcctcg ggctgctgcc cggacagccc gccgcccacc 
44881 gggaggaccc cctcgccgag tggctcacct ggtctgtgga gaggttcggc ggccgggtgc 
44941 cgcaggacgt cgaggagctg gtggtcgggc agtggacgat cgaccccgcc ccggtcggga 
45001 tgcgcctcga caccgggctg aggacggtgg gcatgcgcta cgtcgactac aacggcccgt 
45061 cggtggtgcc ggactggctg cacgacgagc cgacccgccg acgggtctgc ctcaccctgg 
4 5121 gcatctccag ccgggagaac agcatcgggc aggtctccgt cgacgacctg ttgggtgcgc 
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45181 tcggtgacgt cgacgccgag atcatcgcga cagtggacga gcagcagctc gaaggcgtcg 
45241 cccacgtccc ggccaacatc cgtacggtcg ggttcgtccc gatgcacgca ctgctgccga 
45301 cctgcgcggc gacggtgcac cacggcggtc ccggcagctg gcacaccgcc gccatccacg 

453 61 gcgtgccgca ggtgatcctg cccgacggct gggacaccgg ggtccgcgcc cagcggaccg 
45421 aggaccaggg ggcgggcatc gccctgccgg tgcccgagct gacctccgac cagctccgcg 

454 81 aggcggtgcg. gcgggtcctg gacgatcccg ccttcaccgc cggtgcggcg cggatgcggg 
45541 ccgacatgct cgccgagccg tcccccgccg aggtcgtcga cgtctgtgcg gggctggtcg 
45601 gggaacggac cgccgtcgga tgagcaccga cgccacccac gtccggctcg gccggtgcgc 
45661 cctgctgacc agccggctct ggctgggtac ggcagccctc gccggccagg acgacgccga 
45721 cgcagtacgc ctgctcgacc acgcccgttc ccggggcgtc aactgcctcg acaccgccga 
457 Bl cgacgactct gcgtcgacca gtgcccaggt cgccgaggag tcggtcggcc ggtggttggc 
45841 cggggacacc ggtcggcggg aggagaccgt cctgtcggtg acggtgggtg tcccaccggg 
45901 cgggcaggtc ggcgggggcg gcctctccgc ccggcagatc atcgcctcct gtgagggctc 
45961 cctgcggcgt ctcggtgtcg accacgtcga cgtccttcac ctgccccggg tggaccgggt 
46021 ggagccgtgg gacgaggtct ggcaggcggt ggacgccctc gtggccgccg gaaaggtctg 
46081 ttacgtcggg tcgtcgggct tccccggatg gcacatcgtc gccgcccagg agcacgccgt 
46141 ccgccgtcac cgcctcggcc tggtgtccca ccagtgtcgg tacgacctga cgtcgcgcca 
46201 tcccgaactg gaggtcctgc ccgccgcgca ggcgtacggg ctcggggtct tcgccaggcc 
4 62 61 gacccgcctc ggcggtctgc tcggcggcga cggtccgggc gccgcagccg cacgggcgtc 
46321 gggacagccg acggcactgc gctcggcggt ggaggcgtac gaggtgttct gcagagacct 
46381 cggcgagcac cccgccgagg tcgcactggc gtgggtgctg tcccggcccg gtgtggcggg 
4 6441 ggcggtcgt'c ggtgcgcgga cgcccggacg gctcgactcc gcgctccgcg cctgcggcgt 
46501 cgccctcggc gcgacggaac tcaccgccct ggacgggatc ttccccgggg tcgccgcagc 
46561 aggggcggcc ccggaggcgt ggctacggtg agagcccgcc cctgacctgc gggaacccgt 
4 6621 gtcggtgcgg cgggacggcc gccgcggtcc ccgccccggt cagccggtgg gggtgagccg 
46681 cagcaggtcc ggcgccaccg actcggccac ctccccgacg tggtcggcga ggtagaagtg 
46741 cccgcccggg aaggtccggg tacggccggg gactaccgag tacggcagcc agcgttgggc 
46801 gtcctccacc gtcgtcaacg ggtcggtgtc accgcagagg gtggtgatgc cggcccgcag 
46861 cggcggcccg gcctgccagg cgtaggagcg cagcacccgg tggtcggccc gcagcaccgg 
46921 cagcgacatg tccaacagcc cctggtcggc caatgcggcc tcgctgaccc cgagcctgcg 
46981 catctgctcg acgagtccgt cctcgtcggg caggtcggtg cgccgctcgt ggacccgggg 
47041 ggcggtctgc ccggagacga acaaccgcag cggtcgcacc cccggacgag cctccaggcg 
47101 acgggcggtc tcgtaggcga ccagggcgcc catgctgtga ccgaacaggg cgaacggaac 
47161 ctcgccgacg aggtcgcgca gcacggccgc gacctcgtcg gcgatctccc cggcggtgcc 
47221 gagagcccgc tcgtcacgtc ggtcctgccg gcccgggtac tgcaccgccc acacgtcgac 
472 81 ctccggggcc agtgcccggg cgaggtcgag gtacgagtcg gcggcggctc ccgcgtgcgg 
47341 gaagcagtac agccgggccc ggtgtccgtc ggcggacccg aaccgccgca accaggtgtt 
47401 catcggtgtc tcatccgttc ggtcgcaccg gcaggtggtc gatgccgcgc agcaggagcg 
47461 accgccgcca gacaacctcg tcggagggga agcccagcga cagcttcggg aagcggtcga 
47521 acagggcccc cagggcgacc tctccctcca gcttggccag cgggcggccc atgcagtagt 
47581 ggatgccgtg cccgaaggtg aggtgtcccc ggctgtccct ggtgacgtcg aaccggtcgg 
47641 ggtcggggaa ctgtcccggg tcgcggttgg ccgccccgtt ggcgatcagg acggtgctgt 
47701 acgccgggat cgtcaccccg ccgatctcca cctcggcggt ggcgaaccgg gtggtggtct 
47761 ccggtggggc ctggtagcgc aggatctcct ccaccgctcc gggcagcagt gccgggtcct 
47821 tccggaccag cgcgagctgg tcggggtggg tcagcagcag gtaggtgccg atcccgatga 
47881 ggctcaccga cgcctcgaat cccgccagca gcagcaccag cgcgatggag gtgagttcgt 
47941 cgcggctgag ccggtcggcg tcgtcgtcct ggacccggat c 

(SEQ ID NO: 1) 
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SEQUENCE LISTING 

<110> Kosan Biosciences, Inc. 

<120> Recombinant Megalomicin Biosynthetic 
Genes and Uses Thereof 

<130> 300622004740 

<140> To be assigned 
<141> Herewith 

<150> US 60/158, 305 
<151> 1999-10-08 

<150> US 60/190,024 
<151> 2000-03-17 

<160> 34 

<170> FastSEQ for Windows Version 4.0 

<210> 1 
<211> 47981 
<212> DNA 

<213> Micromonospora megalomicea 

<220> 

<221> CDS 

<222> (1) . . . (144 ) 

<223> megBVI (megT) , TDP-4 -keto- 6-deoxyglucose-2 , 3-dehydratase ; 
SEQ ID NO: 2= translated amino acid sequence 

<221> CDS 

<222> (928) . . . (2061) 

<223> megDVI, TDP-4 - keto- 6-deoxygiucose 3, 4 -isomerase, 
TDP-4 -keto- 6-deoxyhexose 3, 4-isomerase; 
SEQ ID NO: 3= translated amino acid sequence 

CDS 

(2072) . . . (3382) 

megDI, rhodosaminyl transferase (eryCIII homolog) , 
TDP-megos amine glycosyl trans f erase ; 
SEQ ID NO: 4= translated amino acid sequence 

<221> CDS 

<222> (3462) . (4634 ) 

<223> megG(megY), mycarosyl acyltransf erase , mycarose O-acyltransf erase; 
SEQ ID NO: 5= translated amino acid sequence 

<221> CDS 

<222> (4651) . . . (5775) 

<223> megDII, deoxysugar transaminase (eryCI, DnrJ homolog), 
TDP- 3 -keto- 6-deoxyhexose 3-ami no transaminase; 
SEQ ID NO: 6= translated amino acid sequence 

<221> CDS 

<222> (5822) . . . (6595) 

<223> megDIII, daunosaminyl-N, N-dimethyl trans f erase (eryCVI homolog); 
SEQ ID NO: 7= translated amino acid sequence 



<221> 
<222> 
<223> 
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<221> CDS 

<222> (6592) . . . (7197) 

<223> megDIV, TDP-4-keto-6-deoxyglucose 3, 5-epimerase (eryBVII, dnmU 
homolog), TDP-4-keto-6-deoxyhexose 3, 5-epimerase; 
SEQ ID NO: 8= translated amino acid sequence 

<221> CDS 

<222> (7220) . . . (8206) 

<223> megDV, TDP-hexose 4-ketoreductase (eryBIV, dnmV homolog) , 
TDP-4-keto-6-deoxyhexose 4-ketoreductase; 
SEQ ID NO NO: 9= translated amino acid sequence 

<221> CDS 

<222> (8228) . . - (9220) 

<223> megBII-1 (megDVII) , TDP-4-keto-L-6-deoxy-hexose 2, 3-reductase ; 
SEQ ID NO: 10= translated amino acid sequence 

• <221> CDS 
<222> (9226) . . . (10479) 

<223> megBV, mycarosyl transferase, mycarose glycosyltransf erase ; 
SEQ ID NO: 11= translated amino acid sequence 

<221> CDS 

<222> (10483) . . . (11424 ) 

<223> megBIV, TDP-hexose 4 -keotreductase , 

TDP-4 -keto-6-deoxyhexose 4-ketoreductase; 
SEQ ID NO: 12= translated amino acid sequence 

<221> CDS 

<222> (12181) . . - (22821) 

<223> megAI; SEQ ID NO: 13= translated amino acid sequence 

<221> misc_feature 
<222> (12505) . . . (13470) 
<223> megAI, AT-L 

<221> misc__feature 
<222> (13576) . . - (13791) 
<223> megAI, ACP-L 

<221> misc_feature 
<222> (13849) . . - (15126) 
<223> megAI, KS1 

<221> misc_feature 
<222> (15427) . . . (16476) 
<223> megAI, ATI 

<221> misc_feature 
<222> (17155) ... (17694) 
<223> megAI, KR1 

<221> misc__f eature 
<222> (17947) . . . (18207) 
<223> megAI, ACPI 

<221> misc_feature 
<222> (18268) . . . (19548) 
<223> megAI, KS2 

<221> misc feature 
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<222> (19876) . . . (20910) 
<223> megAI, AT 2 

<221> misc_feature 
<222> (21517) . . . (22053) 
<223> megAI, KR2 

<221> misc_f eature 
<222> (22318) . . . (22575) 
<223> megAI, ACP2 

<221> CDS 

<222> (22867) . . . (33555) 

<223> megAII; SEQ ID NO: 14= translated amino acid sequence 

<221> misc_feature 
<222> (22957) . . . (24237) 
<223> megAII, KS3 

<221> misc_feature 
<222> (24544) . . . (25581) 
<223> megAII, AT3 

<221> misc_feature 
<222> (26230) . . . (26733) 
<223> megAII, KR3 (inactive) 

<221> misc_f eature 
<222> (26998) . . . (27258) 
<223> megAII, ACP3 

<221> misc_feature 
<222> (27393) . . . (28590) 
<223> megAII, KS4 

<221> misc_feature 
<222> (28897) . . . (29931) 
<223> megAII, AT 4 

<221> misc_feature 
<222> (29953) . . . (30477) 
<223> megAII, DH4 

<221> misc_feature 
<222> (31396) . . - (32244) 
<223> megAII, ER4 

<221> misc__f eature 
<222> (32257) . . . (32799) 
<223> megAII, KR4 

<221> misc_feature 
<222> (33052) . . . (33312) 
<223> megAII, ACP4 

<221> CDS 

<222> (33666) . . . (43271) 

<223> megAIII; SEQ ID NO: 15= translated amino acid sequence 

<221> mis cofeature 
<222> (33780) . . . (35027) 
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<223> megAIII, KS5 

<221> misc_feature 
<222> (35385) . . . (36419) 
<223> megAIII, AT 5 

<221> misc_f eature 
<222> (37068) . . . (37604) 
<223> megAIII, KR5 

<221> misc_feature 
<222> (37860) . . . (38120) 
<223> megAIII, ACP5 

<221> misc_f eature 
<222> (38187) . . . (39470) 
<223> megAIII, KS6 

<221> misc^feature 
<222> (39795) • . . (40811) 
<223> megAIII, AT 6 

<221> misc_feature 
<222> (41406) . . . (41936) 
<223> megAIII, KR6 

<221> misc_feature 
<222> (42168) . . . (42425) 
<223> megAIII, ACP6 

<221> misc_feature 
<222> (42585) . . . (43271) 
<223> megAIII, TE 

<221> CDS 

<222> (43268) . . . (44344) 

<223> megCII, TDP-4 - keto-6-deoxyglucose 3, 4-isomerase; 
SEQ ID NO: 16= translated amino acid sequence 

<221> CDS 

<222> (44355) . . . (45623) 

<223> megCIII, desosaminyl transferase, desosamine glycosyltrans f erase ; 
SEO ID NO: 17= translated amino acid sequence 

<221> CDS 

<222> (45620) . . . (46591) 

<223> megBII-2 (megBII) , TDP-4-keto-6-deoxy-L-glucose 2,3 dehydratase, 
TDP-4-keto-6-deoxyglucose 2,3 dehydratase; 
SEQ ID NO: 18= translated amino acid sequence 

<221> CDS 

<222> (46660) . . . (47403) 

<223> megH, TEII; SEQ ID NO: 19= translated amino acid sequence 
<221> CDS 

<222> (47411) . . . (47980) 

<223> megF, C-6 hydroxylase; SEQ ID NO: 20= translated amino acid sequence 
<400> 1 

ctcgagecga tgctcggcgg cgcggtgggc caaccagtcg tggacgtcgt cggtggcggt 60* 
gggaggtccg ccgtgccgag tcaggaaacg tattgccgat tgtgtggatt ccggagtcgc 120 



4 



RNSDOCID: <WO 0127284A2 J_> 



WO 01/27284 



PCT/US00/27433 



atgaccgttg acccgatccc ccatacgcct ctcccgtgat gtcgtgggcg gtccgtgcgg 180 

taccgcccgg actgacattc gtcgatcaag accccgccca gtgtagggct ccgcccgcga 24 0 

cgggagaagg tccgtcgaac aacttccggg tgaccggtcg ccggcgtcgg tgaaacgggc 300 

gtcggagcac ccgatcattg ctgtcggtga acttcctaac tgtcggcgcg cacatctttc 360 

tgaccggtgt gttccgtggt atgacgcgtt cccggcccgt ctggaactgt gcgtgggact 420 

gaccggttgc ggcgtgtttt cgcccgtttc cgaactgcgg attcgtcgat cgcgcaggtg 480 

ggagcgggtg gctgaccggg atgatctgca atcatggcgc tcaatgacga tctcttgtag 540 

catggtccgc gccgagggtc cgacaggccc gaaacgcccg gcatccagcc tgttcgacga 600 

cgtcgacatc accgtgcaag ccgcgatgac accgacacca cgccatgctg gtgccgcact 660 

ggaagggtgg cgcgatcagg gaaatggccg tgtcactaga cagacgccaa acagctgtcc 720 

gggcctgcgg aaacagcatc gatctgcgtc agccgttcat tgccccggcg gcaccgcctt 780 

ggaaatccgt gccaccggtc gtccgcagtg acgatcgcgg acccgggttt cgagacagca 84 0 

ggtagtaggc gatgcaggcg tttcgtctcg cgccggacgc gtcgcactag gtggaatccg 900 

tcacagtctt caatccggga gcgttctatg gcagttggcg atcgaaggcg gctgggccgg 960 

gagttgcaga tggcccgggg tctctactgg gggttcggtg ccaacggcga tctgtactcg . 1020 

atgctcctgt ccggacggga cgacgacccc tggacctggt acgaacggtt gcgggccgcc 1080 

ggacggggac cgtacgccag tcgggccgga acgtgggtgg tcggtgacca ccggaccgcc 1140 

gccgaggtgc tcgccgatcc gggcttcacc cacggcccgc ccgacgctgc ccggtggatg 1200 

caggtggccc actgcccggc ggcctcctgg gccggcccct tccgggagtt ctacgcccgc 1260 

accgaggacg cggcgtcggt gacagtggac gccgactggc tccagcagcg gtgcgccagg 1320 

ctggtgaccg agctggggtc gcgcttcgat ctcgtgaacg acttcgcccg ggaggtcccg 1380 

gtgctggcgc tcggtaccgc gcccgcactc aagggcgtgg accccgacrrg tctccggtcc 14 40 

tggacctcgg cgacccgggt atgcctggac gcccaggtca gcccgcaaca gctcgcggtg 1500 

accgaacagg cgctgaccgc cctcgacgag atcgacgcgg tcaccggcgg tcgggacgcc 1560 

gcggtgctgg tgggggtggt ggcggagctg gcggccaaca cggtgggcaa cgccgtcctg 1620 

gccgtcaccg agcttcccga actggcggca cgacttgccg acgacccgga gaccgcgacc 1680 

cgtgtggtga cggaggtgtc gcggacgagt cccggcgtcc acctggaacg ccgcaccgcc 1740 

gcgtcggacc gccgggtggg cggggtcgac gtcccgaccg gtggcgaggt gacagtggtc 1800 

gtcgccgcgg cgaaccgtga tcccgaggtc ttcaccgatc ccgaccggtt cgacgtggac 1860 

cgtggcggcg acgccgagat cctgtcgtcc cggcccggct cgccccgcac cgacctcgac 1920 

gccctggtgg ccaccctggc cacggcggcg ctgcgggccg ccgcgccggt gttgccccgg 1980 

ctgtcccgtt "ccgggccggt gatcagacga cgtcggtcac ccgtcgcccg tggtctcagc 2040 

cgttgcccgg tcgagctgta gaggaagaac gatgcgcgtc gtgttttcat cgatggctgt 2100 

caacagccat ctgttcgggc tggtcccgct cgcaagcgcc ttccaggcgg ccggacacga 2160 

ggtacgggtc gtcgcctcgc cggccctgac cgacgacgtc accggtgccg gtctgaccgc 2220 

cgtgcccgtc ggtgacgacg tggaacttgt ggagtggcac gcccacgcgg gccaggacat 2280 

cgtcgagtac atgcggaccc tcgactgggt cgaccagagc cacaccacca tgtcctggga 2340 

cgacctcctg ggcatgcaga ccaccttcac cccgaccttc ttcgccctga tgagccccga 2400 

ctcgctcatc gacgggatgg tcgagttctg ccgctcctgg cgtcccgact ggatcgtctg 2460 

ggagccgctg accttcgccg ccccgatcgc ggcccgggtc accggaaccc cgcacgcccg 2520 

gatgctgtgg ggtccggacg tcgccacccg ggcccggcag agcttcctgc gactgctggc 2580 

ccaccaggag gtggagcacc gggaggatcc gctggccgag tggttcgact ggacgctgcg 2640 

gcgcttcggc gacgacccgc acctgagctt cgacgaggaa ctggtgctgg ggcagtggac 2700 

cgtggacccc atccccgagc cgctgcggat cgacaccggc gtccggacgg tgggcatgcg 2760 

gtacgtcccc tacaacggcc cctcggtggt gcccgcctgg ctgttgcggg aacccgaacg 2820 

tcggcgggtc tgcctgaccc tcggcggttc cagccgggaa cacggcatcg ggcaggtctc 2880 

catcggcgag atgttggacg ccatcgccga catcgacgcc gagttcgtgg ccaccttcga 2940 

cgaccagcag ttggtcggcg tgggcagcgt tccggcaaac gtccgtaccg ccgggttcgt 3000 

gccgatgaac gtcctgctgc ccacctgcgc ggccaccgtg caccacggcg gcaccggcag 3060 

ttggctgacc gccgccatcc acggcgtacc gcagatcatc ctctcggacg ccgacaccga 3120 

ggtgcacgcc aagcagctcc aggacctcgg cgcggggctg tcgctcccgg tcgcggggat 3180 

gaccgccgag cacctgcgtg gggcgatcga gcgggttctc gacgagccgg cgtaccgcct 3240 

cggtgcggag cggatgcggg acgggatgcg gaccgacccg tcgccggccc aggtggtcgg 3300 

catctgtcag gacctggccg ccgaccgggc ggcacgcggc aggcagccgc gtcgaaccgc 3360 

cgagccgcac ctgccgcgat gacttccacc accaccggga ccggctgatg ccggtcccgg 3420 

aatccacacg ccgactttcc ttctgacacg agggggcccc ggtggttacc tccaccaact 3480 

tggacacgac agcacggccg gcactgaact cgttgaccgg gatgcggttc gtcgccgcct 3540 

tcctggtctt cttcacgcac gtcctgtcga ggctcatccc gaacagctac gtgtacgccg 3600 

acggcctgga cgccttctgg cagaccaccg gacgggtggg ggtgtcgttc ttctttattc 3660 

tcagcggttt cgtgctgacc tggtcggcgc gggccagcga ctcggtgtgg tcgttctggc —3720* 

gcagacgggt ctgcaagctc ttccccaacc acctggtcac cgccttcgcc gccgtggtgt 3780 
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tgttcctggt caccgggcag gcggtgagcg gtgaggcgct gatcccgaac ctcctgctga 3840 

tccacgcctg gttcccggcc ctggagatct ccttcggcat caacccggtg agctggtcgt 3900 

tggcctgcga ggcgttcttc tacctgtgct tcccgctgtt cctgttctgg atctccggta 3960 

tccgcccgga gcggctgtgg gcctgggccg ccgtggtgtt cgccgcgatc tgggcggtac 4020 

cggtggtcgc cgacctcctg ctgccgagtt ccccgccgct gatcccgggg cttgagtact 4 080 

ccgccatcca ggactggttc ctctacacct tccctgcgac gcggagcctg gagttcatcc 4 140 

tcgggatcat cctggcccgc atcctgatca ccggtcggtg gatcaacgtc gggctgctcc 4 200 

ccgcggtgct gttgttcccg gtcttcttcg tcgcctcgct cttcctgccg ggtgtctacg 4 260 

ccatctcctc gtcgatgatg atccttcccc tggttctgat catcgccagc ggcgcgacgg 4320 

ccgacctcca gcagaagcgc accttcatgc gtaaccgggt gatggtgtgg ctcggcgacg 4 380 

tctccttcgc gctctacatg gtccacttcc tggtgatcgt ctacggggcg gacctgctgg 4 440 

ggttcagcca gaccgaggac gccccgctgg gtctcgcact cttcatgatc attccgttcc 4 500 

tcgcggtctc cctggtgctg tcgtggctgc tgtacaggtt cgtcgagcta cccgtcatgc 4 560 

gtaactgggc ccgcccggcc tccgcccggc gcaaacccgc cacggaaccc gaacagaccc 4 620 

cttcccgccg gtaagaagga cggtgcatcg gtgaccacct acgtctggtc ctatctgttg 4 680 

gagtacgaga gggaacgagc cgacatcctc gatgcggtgc agaaggtctt cgccagtggc 4740 

agcctgatcc tcggtcagag tgtggagaac ttcgagaccg agtacgcccg ctaccacggg 4 800 

atcgcgcact gcgtgggcgt cgacaacggc accaacgctg tgaaactcgc gctggagtcg 4 860 

gtaggtgtcg gacgcgacga cgaggtcgtc acggtctcca acaccgccgc ccccacagtc 4 920 

ctggccatcg acgagatcgg cgcccggccg gtcttcgtgg acgtccgcga cgaggactac 4 980 

ctcatggaca ccgacctggt ggaggcggcg gtcaccccgc gtaccaaggc catcgtcccg 5040 

gtgcacctgt acgggcagtg cgtggacatg acagccctgc gggaactggc cgaccggcgg 5100 

ggcctcaagc tcgtggagga ctgcgcccag gcccacggtg cccggcggga cggtcggctg 5160 

gccgggacga tgagcgacgc ggcggccttc tcgttctacc cgacgaaggt cctcggcgcc 5220 

tacggcgacg gcggcgcggt cgtcaccaac gacgacgaga cagcccgcgc cctgcgacgg 5280 

ctgcggtact acgggatgga ggaggtctac tacgtcaccc ggaccccggg tcacaacagc 5340 

cgcctcgacg aggtgcaggc cgagatcctg cggcgcaaac tgacccggct cgacgcgtac 5400 

gtcgcgggtc ggcgggcggt cgcccagcgg tacgtcgacg ggctcgccga cctccaagac 5460 

tcgcacggcc tcgaactccc agtggtcacc gacggcaacg aacacgtctt ctacgtgtac 5520 

gtcgtccgcc acccgcgccg cgacgagatc atcaagcgtc tccgggacgg gtacgacatc 5580 

tccctgaaca tcagctaccc ctggccggtg cacaccatga ccggcttcgc ccacctcggt 5640 

gtcgcgtcgg ggtcgctgcc ggtcaccgaa cggctggccg gcgagatett ctcccttccc 5700 

atgtacccct ccctccctca cgacctgcag gacagggtga tcgaggcggt gcgggaggtc 5760 

atcaccgggc tgtgacgagc ccgcgtgtcg tcagcgaaga cccactctgg aagggccggt 5820 

catgccgaac agccactcga ccacgtcgag caccgacgtc gccccgtacg agcgggcgga 5880 

catctaccac gacttctacc acggccgtgg caagggatac cgtgccgaag ccgacgcgct 5940 

cgtggaggtc gcccgcaagc acaccccaca ggcggcgacc ctgctggacg tggcctgcgg 6000 

gaccggatcc cacctggtcg agctggcgga cagcttccgg gaggtggtgg gggtcgacct 6060 

gtcggccgcc atgctcgcca ccgccgcccg caacgacccc gggcgggaac tgcaccaggg 6120 

cgacatgcgc gacttctccc tcgaccgcag gttcgacgtc gtcacctgca tgttcagctc 6180 

caccggttac ctcgtcgacg aggccgaact ggaccgtgcc gtggcgaacc tggccggtca 624 0 

cctcgcgcct ggcggcaccc tcgtcgtgga gccctggtgg ttcccggaga cgttccggcc 6300 

cggctgggtc ggggccgacc tggtcaccag cggtgaccgg aggatctccc ggatgtcgca 6360 

caccgtcccg gcgggtctgc ccgaccgcac cgcctcccgg atgaccatcc actacacggt 64 20 

ggggtcaccg gaggccggga tcgagcactt caccgaggtg cacgtgatga ccctgttcgc 64 80 

ccgcgccgcc tacgagcagg ccttccagcg ggcgggcctg agctgctcgt acgtcggcca 6540 

cgacctgttc tcgccgggcc ttttcgtcgg ggtcgccgcg gagccggggc ggtgagggtc 6600 

gaggagctgg gcatcgaggg ggtcttcacc ttcaccccgc agacgttcgc cgacgagcgg 6660 

ggggtgttcg gcacggcgta ccaggaggac gtgttcgtgg cggcgctcgg ccgcccgctg 6720 

ttcccggtgg cccaggtcag caccacccgg tcccggcggg gtgtggtccg gggggtgcac 6780 

ttcacgacga tgcccggctc catggcgaag tacgtctact gcgccagggg tagggcgatg 6840 

gacttcgccg tcgacatccg gcccggttcc ccgaccttcg gccgggccga gccggtcgag 6900 

ctctccgccg agtcgatggt cgggctgtac cttcccgtgg gcatgggcca cctgttcgtc 6960 

tccctggagg acgacaccac cctcgtctac ctgatgtccg ccggttacgt ccccgacaag 7020 

gaacgggcgg tgcaccccct ggatccggag ctggcgttgc cgatcccggc cgacctcgac * 7080 

ctcgtcatgt ccgagcggga ccgggtcgca cccaccctcc gggaggcccg ggaccagggg 7140 

atcctgcccg actacgccgc ctgccgggcc gccgcgcacc gggtggtgcg gacgtgaccc 7200 

cggccgggcg tgcgggccgg tggtggtgct cggcgcgtcg ggtttcctgg gttcggcggt 7260 

cacccacgcc ctggccgacc tcccggtgcg ggtgcggctc gtcgcccggc gggaggtcgt 7320 

cgtgccGtcc ggtgccgtcg ccgactacga gacgcaccgg gtggacctca ccgaacccgg -^7380- 

agcgctcgcg gaggtggtcg cggacgcccg ggcggtcttc ccgttcgccg cccagatcag 7440 
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gggtacgtca gggtggcgga tcagcgagga cgacgtggtc gccgaacgga cgaacgtcgg 7500. 
cctggtccgg gacctgatcg ccgtcctgtc ccgctcgccg cacgccccgg tggtggtctt 7560 
cccgggcagc aacacgcagg tcggcagggt caccgccggc cgggtcatcg acggcagcga 7620 
gcaggaccac cccgagggcg tctacgacag gcagaaacac accggggaac agctgctcaa 7680 
ggaggccact gcggccgggg cgatccgggc gaccagtctg cggctgcccc cggtgttcgg 7740 
ggtgcccgcc gccggcaccg ccgacgaccg gggggtggtc tccaccatga tccgtcgggc 7800 
cctgaccggc caaccgotga cgatgtggca cgacggcacc gtccggcgtg aactgctgta 78 60 
cgtgaccgac gccgcccggg ccttcgtcac cgccctggac cacgccgacg cgctcgccgg 7 920 
acgccacttc ctgttgggga cggggcgttc ctggccgctg ggcgaggtct tccaggcggt 7980 
ctcgcgcagc gtcgcccggc acaccggcga ggacccggtg ccggtggtct . cggtgccgcc 8040 
tccggcgcac atggacccgt cggacctgcg cagcgtggag gtcgaccccg cccggttcac 8100 
ggctgtcacc gggtggcggg ccacggtcac gatggcggag gcggtcgacc ggacggtggc 8-160 
ggcgttggcc ccccgccggg ccgccgcccc gtccgagccc tcctgaccgg ggtcacccgg 8220 
gttcgtccta cggcaccggc ccgtcgacgg ccggtgccgg gaagatcgct tcgagttccc 8280 
ggagttcctc ctcgcccagc gtcagctcgg cggcccgtaa cgccgagtcg. agctgctcgg 8340 
gtgtgcgggg gccgatgaca gcgcccagga tcccggggcg ggacaggacc caggccagac 84 00 
cgacctcggc cgggtccgcg ccgaggcgtc ggcagtagtc ctcgtacgcc tcgacgaggg 84 60 
ggcgtacggc ggggaggagc acctgggcgc gtccctgcgc cgacttgacg gcggttccgg 8520 
ctgccaactt ctccagtacg ccgctgagca gcccgccgtg caggggggac caggcgaaca 8580 
cgcccacccc gtacgcctgg gcggcgggca ggacgtccag ctcggggtgg cggacggcca 8640 
ggttgtacag gcactggtgg gagatcatgc cgagcaggtt gcggcgtgcc gcgctctcct 8700 
gggcggcggc gatgtgccag cccgccaggt tggaggagcc gacgtacccg accttcccac 87 60 
tgccgaccag atgttcggcg gcctgccaca cctcgtccca cggtgcggcg cggtcgatgt 8820 
ggtgcgtctg gtagatgtcg atgtggtcga ccccgaggcg gcggagggag ttctcgcagg 8880 
cggcgacgat gtgtcgggcg gagagcccgc cgtcgttgac ccgttcgctc atctcgctgc 8940 
ccaccttggt cgccaggacg gtctcctcgc gtcgacctcc gccctgggcg aaccaccgtc 9000 
cgacgagttc ctcggtgtgg cccttgtaga gccgccagcc gtagatgtcg gcggtgtcga 9060 
tgcagttgac gccccgctcg agggcgtggt ccatcagccg cagcgcgtcg tcgtcggtca 9120 
cccgtccact gaagttcacg gtgccgagcc agagtcggct ggtgtgcaac gccgatcgtc 9180 
cgacgcgtac ccgggcggac ccggccccgg tggttcccac gtcggtcacc tgtcggcgcg 9240 
gtgctggtgg gcgagcgcct ccagcacggg tacgacctcg gcgggggtcg gcgcggccag 9300 
cgcctcctgc cgcagcttct cggcgttctc ggcgtgggaa cggtcctcga ccactgtggc 9360 
gagagcctgc cagagggtgt cggcgtcgac ctcgtccgga cggaggaaga cacccgctcc 9420 
cagctcggcg gtgcgctgac cacgcaggac acagtcccac tcgtgggcga cggagatctg 9480 
cggtacgccg tggtgcagcg cggtggccca gcttccggca ccgccgtggt ggatgacggc 9540 
ggcacagccc ggcagcagga tgttcatggg aacgaagtcc accaggcgga cgttgtccgg 9600 

caccgacgcc ggatcgagcc cggagcgggt caccacgatc tcgccgtcga accgcgcgag 9660 
ggtggccagt gtccggagga actcctgcgg gttcgaggtg atgcccagcg ccgagtatcc 9720 
cccggtgaag cagacccggc ggactccgtc cgaggtcctg agccactgcg gcacgacgga 9780 

ggacccgttg tagggcaaag tccgggtgtg caccgactcc agtccggtct ccaggcggaa 9840 

gctctcgggc agctggtcga cgctccactg tccgacagcg aggtcctcgc tgtagtcgag 9900 

gccgaaccgg ccggcgacct cggtgagcca gccgccgagc gggtccggcc ggtcgtcggc 9960 

gggacgctgc ccgcgcaggt cctgggagcg gctgcggaag tagccggtga ggtcgctgcc 10020 

ccacagcagc cgggcgtggg cggccccgca ggccttggcc gcgaccgccc cggcgaaggt 10080 

gaagggctcc cagagcacca ggtcgggacg ccagtccatg gcgaactcga cgagttcgtc 10140 

gacgaaggag tcgttgttga ccaccgggaa gacgaaccgg gaggtggcct cctc'gatgcc 10200 

gtgcaggaac tcccacgagc gcagttccgg tccgcgtcgg gcgaagtcca ggtcggtggt 10260 

gtagcggtgc acctgcgcgg cggcctcagg ggagatgtcg aagagtcggt ggtccgagcc 10320 

gagtggcacc gaggtcagtc ccgcgccgac gacgacgtcg gtgagctcgg gctgactggc 10380 

cacccggacg tcgtggccgg cggtgtgcag cgcccaggcc agggggacga ggccctggaa 10440 

gtgggtacgg tgcgcgaacg aggtgagcag gacccgcact ggtcactcct tggtcgagat 10500 

gagggcggca acggtccggt cgatgccctc ggccagcggc acccgggggt gccagccggt 10560 

cagcgtccgg aactcggtgg agtcgaagtc gtcgctgcgg aagtcgttgg cctcggcgtt 10620 

ctccggtgga gggacgctga cgacgggcac cgcagggttg ccggtctgac gtgccacgct 10680 

ggcggcgacg gtctcgaaga tctcgccgag gggtcgggcc tcgtccgcgc tcggcgtcca 10740 

gacgtcgccg accagcgcct cgtggttgtg cagtgcggcg gtgaacgcgg tggccacgtc 10800 

ctcgacgtgc aggaggttgc ggcgcacgct gccctcgtgc cacatcgtga tcggctcacc 10860 

ggcgagggct cgccggatca tggcggtgac gacaccccgg ccggtctgcc ccgacgggcc 10920 

gctgtggccg tagatcgcgg gcaggcgcag gatcaccccg tcgacgaccc cgtcctcggt 10980 

ggcctgacgc aggatccgct cggcctcgat cttgtgctgg gcgtaccggc tgggggcggc li040- 

ggggttcgcg gcctgggtgg tgctggcgaa caggagcacc ggcgcgggtc cgggtcttgc 11100 
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ccgcagcgcg gcgacgaggt cgcgcatgat gcccgcgttg acgcgttcgg cctcgggcac 11160 

cgtggcggcg ctgcgccagg tcgacccgcc ggcggcgtag gcgaccagat gcacgacgac 11220 

gtcggtgtcg gcgacgacct gcgcgacccg gccgggttcg agcaggtcga ctcgaaggtg 11280 

ctcgatcccg gcgctgcctg gtggctggtc gcgagacccg gtgcgcgcga cggcccgcag 11340 

tcggagaggg tgtgtggtaa attcgcgaag aagggcgctt ccgacgaatc cagaaacgcc 11400 

gagaagtgtg acatgtcttg tcatctacta atgcattccg atagccaccg gcgcatggaa 11460 

tccatttgtt ccccccaggg tggtgtcggg tgacaaatcc ggcctcaggt cggcctcaag 11520 

cctctttcga gcgggtgctg aggcttcccg cgtaccctcg gtggcctgcg ttcgggcggg 11580 

tgtcggggaa agggcggatc gaggagttcg gtagggcgtc gcggcgcgta ctccgggact 11640 

gatccgggtc gacgccccga cgcgtgacag ggcgtcgatc cgtgccgccc gtaccgccgg 11700 

ttttcggcga tggtcgcaga ttcctcccga cgtggtggac tcattggttc tcccgggtgt * 117 60 

ggccgcaccg tcggtggcct cgtcgggggt gtcggagacc gggtcgatcg ccgtccccgg 11820 

ccgtgccgac cagggtcggt ccgtcgccga ggtgggtcac cgtcgggtgg acccggtccg 11880 

ccggcggcca ccgcccgatc gtgcccacct tcgcctccgc gggtaaatgc ttcgtcgatc 11940 

tgatcgacac ttccggcgac gctatcaccg gagcattccc cggcaccacc ggtcgatgcc 12000 

tcgcgctttc caaacaggga aaacagcagc tcacagcggt tccaggcgcc gggcaatcct 12060 

agcgaagagt ctcgatgggg tcaaggtgaa ttctgtcaca gatgtttttg ttaaatgtac 12120 

tttcttcagc caccctcgac gttcatacaa ttggccggca tctctaccaa gggggagtga 12180 

gtggttgacg tgcccgatct actcggcacc cggactccgc acccagggcc gctcccattc 12240 

ccgtggcccc tgtgcggtca caacgaaccg gagctgcggg cccgcgcccg tcaattgcac 12300 

gcatatctcg aaggcatttc cgaggatgac gtggtggccg tcggcgccgc cctcgcgcgc 12360 

gagacacgcg cgcaggacgg gccgcaccgc gccgtcgtcg tggcctcctc ggtcaccgag 12420 

ctgaccgccg cgctcgccgc cctcgcccag ggccgcccac acccctcggt ggtacgcggt 12480 

gtcgcccgac ccacggcacc ggtggtgttc gtcctgcccg gtcagggcgc ccagtggccc 12540 

ggcatggcga cccgactgct cgccgagtcg cccgtcttcg ccgcggcgat gcgggcctgc 12600 

gagcgggcct tcgacgaggt caccgactgg tcgttgaccg aggtcctgga ctcacccgag 12660 

cacctgcgcc gcgtcgaggt ggtccagccc gcgctcttcg cggtgcagac ctcactggcc 12720 

gccctgtggc ggtcgttcgg ggtgcgaccc gacgccgtac tcggacacag catcggtgag 12780 

ctggccgccg ccgaggtctg cggcgccgtc gacgtcgagg ccgccgcgcg ggccgccgcc 12840 

ctgtggagcc gcgagatggt cccactggtg ggccggggtg acatggcggc ggtggcgctc 12900 

tccccggccg agctggcagc ccgggtcgag cggtgggacg acgacgtcgt gccggccggg 12960 

gtcaacggtc 'cccggtcggt gctgctcacc ggcgctcccg agcccatcgc acggcgggtc 13020 

gccgagctgg cggcacaggg cgtacgcgcc caggtcgtca acgtgtcgat ggcggcgcac 13080 

tcggcgcagg tcgacgccgt cgccgagggc atgcgctcgg cgctgacctg gttcgccccc 13140 

ggcgactccg acgtgcccta ctacgccggc ctcaccggcg ggcggctgga cacccgggaa 13200 

ctcggcgccg accactggcc gcgcagtttc cggctcccgg tgcgcttcga cgaggcgacc 13260 

cgtgcggtcc tggaactgca gcccggcacg ttcatcgagt cgagcccgca cccggtgctg 13320 

gcggcctccc tgcagcagac cctcgacgag gtcgggtccc cggccgcgat cgtgccgacc 13380 

ctgcaacgcg accagggcgg tctgcggcgg ttcctgctcg ccgtggcgca ggcgtacacc 13440 

ggtggcgtga cagtcgactg gaccgccgcc taccccgggg tgacccccgg ccacctgccg 13500 

tcggccgtcg ccgtcgagac cgacgaggga ccctcgacgg agttcgactg ggccgcgccc 13560 

gaccacgtac tgcgcgcgcg gctgctggag atcgtcggcg ccgagacggc cgcgctcgcc 13620 

gggcgggagg tcgacgcccg ggccaccttc cgggaactgg gcctcgactc ggtcctcgcg 13680 

gtgcagctgc ggacccgcct cgccacggcg accgggcggg atctgcacat cgccatgctc 13740 

tacgaccacc cgaccccgca cgccctcacc gaggcgctgc tgcgcggccc gcaggaggag 13800 

ccggggcggg gtgaggagac ggcacacccg acggaggccg aacccgacga acccgtcgcc 13860 

gtggtcgcca tggcgtgccg gctgcccggc ggcgtcacct caccggagga gttctgggag 13920 

ctgctggccg aggggcggga cgccgtcggc gggctgccca ccgaccgggg atgggacctg 13980 

gactcgctgt tccacccgga cccgacccgg tcgggcacgg cgcaccagcg cgctggtggc 14040 

ttcctcaccg gcgccacctc cttcgacgct gccttcttcg ggctgtcgcc acgggaggca 14100 

ctggccgtcg agccgcagca gcggatcacg ttggagctgt cgtgggaggt gctggaacgc 14160 

gccgggatcc ccccgacgtc gttgcggacc tcccggaccg gggtgttcgt cggtctgatc 14220 

ccccaggagt acggcccccg gctggccgag gggggtgagg gcgtcgaggg ctacctgatg 14280 

accgggacca ccaccagcgt cgcctccggt cgggtcgcct acaccctcgg cctggagggg 14340 

ccggcgatca gcgtcgacac cgcctgctcg tcgtcgctcg tcgccgtgca cctggcgtgc 144 00 

cagtcgctgc ggcgcggcga gtcgacgatg gcgctcgccg gtggcgtgac ggtgatgccg 144 60 

acaccgggca tgctcgtgga cttcagtcgg atgaactccc tcgcccccga cggacggtcc 14520 

aaggcgttct cggccgccgc cgacgggttc ggcatggccg aaggcgcagg gatgctcctg 14580 

ctggaacggc tctcggacgc ccgccgccac ggccacccgg tgctcgccgt gatcaggggc 14 640 

accgctgtca actccgacgg cgcgagcaac ggactctccg ccccgaacgg ccgggcccag IHTOO* 

gtccgggtga tccgacaggc cctcgccgag tccgggctga cgccccacac cgtcgacgtc 14760 
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gtggagaccc acggcaccgg cacccgcctc ggtgatccga tcgaggcacg ggcgctctcc 14820 

gacgcgtacg gcggtgaccg tgagcacccg ctgcggatcg gctcggtcaa gtccaacatc 14 880 

gggcacaccc aggccgccgc cggtgtcgcc ggtctgatca aactggtgtt ggcgatgcag 14 940 

gccggtgtcc tgccccgcac cctgcacgcc gacgagccgt caccggagat cgactggtcc 15000 

tcgggcgcga tcagcctgct ccaggagccc gctgcctggc ccgccggcga gcggccccgc 15060 

cgggccgggg tgtcctcgtt cggcatcagc ggcaccaacg cacacgcgat catcgaggag 15120 

gcgccgccga ccggtgacga cacccgaccc gaccggatgg gcccggtggt gccctgggtg 15180 

ctctcggcga gcaccggcga ggcgttgcgc gcccgggcgg cgcggctggc cgggcaccta 1524 0 

cgcgagcacc ccgaccagga cctggacgac gtcgcctact cgctggccac cggtcgggcc 15300 

gcgctggcgt accgtagtgg gttcgtgccc gccgacgcgt ccacggcgct gcggatcctc 15360* 

gacgaactcg ccgccggtgg atccggggac gcggtgaccg gcaccgcccg cgccccgcag 15420 

cgcgtcgtct tcgtcttccc cggccaggga tggcagtggg cggggatggc agtcgacctg 15480 

ctcgacggcg acccggtctt cgcctcggtg ctgcgggagt gcgccgacgc gttggaaccg 15540 

tacctggact tcgagatcgt cccgttcctg cgggccgagg cgcagcgccg gacccccgac 15600 

cacacgctct ccaccgaccg cgtcgacgtg gtccagccgg tgctgttcgc ggtgatggtg 15660 

tccctggcgg cccggtggcg ggcgtacggg gtggaaccgg cggccgtcat cggacactcc 15720 

cagggggaga ttgccgcggc gtgtgtggcc ggggcgctct cgctggacga cgcggcccgg 15780 

gcggtggccc tgcgcagccg ggtcatcgcc accatgcccg gcaacggcgc gatggcctcg 15840 

atcgccgcct ccgtcgacga ggtggcggcc cggatcgacg ggcgggtcga gatcgccgcc 15900 

gtcaacggtc cgcgcgcggt ggtggtctcc ggcgaccgtg acgacctgga ccgcctggtc 15960 

gcctcctgca ccgtcgaggg ggtgcgggcc aagcggctgc cggtggacta cgcgtcgcac 16020 

tcctcgcacg tcgaggccgt ccgtgacgcg ctccacgccg aactcggcga gttccggccg 16080 

ctgccgggct tcgtgccgtt ctactcgaca gtcaccggcc gctgggtcga gcccgccgaa 16140 

ctcgacgccg ggtactggtt tcgcaacctg cgccacaggg tccggttcgc cgacgcggtc 16200 

cgctccctcg ccgaccaggg gtacacgacg ttcctggagg tcagcgccca cccggtgctc 16260 

accacggcga tcgaggagat cggtgaggac cgtggcggtg acctcgtcgc tgtccactcg 16320 

ctgcgacgtg gggccggcgg tcccgtcgac ttcggctccg cgctggcccg cgccttcgtg 16380 

gccggcgtcg cagtggactg ggagtcggcg taccagggtg ccggggcgcg tcgggtgccg 16440 

ctgcccacgt acccgttcca gcgtgagcgc ttctggttgg aaccgaatcc ggcccgcagg 16500 

gtcgccgact ccgacgacgt ctcgtccctg cggtaccgca tcgaatggca cccgaccgat 16560 

ccgggtgagc cgggacggct cgacggcacc* tggctgctgg cgacgtaccc cggtcgggcc 16620 

gacgaccggg tcgaggcggc gcggcaggcg ctggagtccg ccggggcgcg ggtcgaggac 16680 

ctggtggtgg agccccggac gggccgggtc gacctggtgc ggcggctcga cgccgtgggt 16740 

ccggtggcgg gcgtgctctg cctgttcgct gtcgcggagc cggcggccga acactccccg 16800 

ctggcggtga cgtcgttgtc ggacacgctc gacctgaccc aggcggtggc cgggtcgggc 16860 

cgggagtgtc cgatctgggt ggtcaccgag aacgccgtcg ccgtcgggcc cttcgaacgg 16920 

ctccgcgacc cggcccacgg cgcgctctgg gccctcggtc gggtcgtcgc cctggagaac 16980 

cccgccgtct ggggcggcct ggtcgacgtg ccgtcgggtt cggtcgccga gctgtcgcgt 17040 

cacctcggga cgaccctgtc cggcgccggc gaggaccagg tcgccctccg acccgacggg 17100 

acgtacgccc gccggtggtg cagggcgggc gcgggcggca cgggccggtg gcagccccgg 17160 

ggcacggtgc tcgtcaccgg cggcaccggc ggggtcggtc ggcacgtcgc ccggtggctg 17220 

gcccgccagg gcaccccgtg cctggtgctg gccagccgcc ggggaccgga cgccgacggg 17280 

gtcgaggagc tactcaccga actcgccgac ctgggcaccc gggccaccgt caccgcctgc 17340 

gacgtcaccg accgggagca gctccgtgcc ctcctcgcga ccgtcgacga cgagcacccg 17400 

ctgtcggcgg tgttccacgt cgccgcgacg ctcgacgacg gcaccgtcga gaccctcacc 17460 

ggtgaccgca tcgaacgggc caaccgggcg aaggtgctcg gtgcccgcaa cctgcacgag 17520 

ctgacccggg acgccgacct cgacgcgttc gtgctcttct cctcctccac cgccgcgttc 17580 

ggcgcgccgg ggctcggcgg ctacgtcccg ggcaacgcct acctcgacgg tctcgcccag 17640 

cagcgacgca gcgagggact cccggccacc tcggtggcgt ggggtacctg ggcgggcagc 17700 

gggatggccg agggtccggt cgccgaccgg ttccgccggc acggggtcat ggagatgcac 17760 

cccgaccagg ccgtcgaggg tctccgggtg gcactggtgc agggtgaggt agccccgatc 17820 

gtcgtcgaca tcaggtggga ccggttcctc ctcgcgtaca ccgcgcagcg ccccacccgg 17880 

ctcttcgaca ccctcgacga ggcccgtcgg gccgcgcccg gtcccgacgc cgggccgggg 17940 

gtggcggcgc tggccgggct gcccgtcggg gaacgcgaga aggcggtcct cgacctggta 18000 

cggacgcacg cggctgccgt cctcggccac gcctcggccg agcaggtgcc cgtcgacagg 18060 

gccttcgccg aactcggcgt cgactcgctg tcggccctgg aactgcgcaa ccggctgacc 18120 

actgcgaccg gggtccggct ggccacgacg acggtcttcg accacccgga cgtacggacc 18180 

ctggccggac acctggccgc cgaactgggc ggcggatcgg ggcgggagcg gcccgggggc 18240 

gaggccccga cggtggcccc gaccgacgag ccgatcgcca tcgtcgggat ggcctgccgg 18300 

ctgccggggg gagtggactc accggagcag ctgtgggagt tgatcgtctc cgggcgggac 18-360 - 

accgcctcgg cggcacccgg ggaccggagc tgggatccgg cggagttgat ggtctccgac 18420 
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acgacgggca cccgtaccgc cttcggcaac ttcatgcccg gggcgggcga gttcgacgcg 18480 

gcgttcttcg ggatctcgcc gcgtgaggcg ttggcgatgg atccgcagca gcggcacgcc 18540 

ctggagacca cctgggaggc gctggagaac gccggtatcc ggcccgagtc gttgcgcggt 18600 

acggacaccg gtgtcttcgt gggcatgtcc catcaggggt acgccaccgg ccgcccgaag 18660 

cccgaggacg aggtcgacgg ctacctgttg acaggcaaca ccgcgagcgt cgcctccggt 18720 

cggatcgcgt acgtgttggg gttggagggg ccggcgatca ctgtggacac ggcgtgttcg 18780 

tcgtcgcttg tggcgttgca cgtggcggcg ggttcgttgc gttctgggga ctgtggtctg 18840 

gcggtggcgg gtggggtgtc ggtgatggcc ggtccggagg tgttcaggga gttctcccgg 18900 

cagggcgcgt tggctccgga cggcaggtgc aagcccttct cggacgaggc cgacggcttc 18960 

ggtctggggg aggggtcggc cttcgtcgtg ttgcagcggt tgtcggtggc ggtgcgggag 19020 

gggcgtcggg tgttgggtgt ggtggtgggt tcggcggtga atcaggatgg ggcgagtaat 19080 

gggttggcgg cgccgtcggg ggtggcgcag cagcgggtga ttcggcgggc gtggggtcgt 19140 

gcgggtgtgt cgggtgggga tgtgggtgtg gtggaggcgc atgggacggg gacgcggttg 19200 

ggggatccgg tggagttggg ggcgttgttg gggacgtatg gggtgggtcg gggtggggtg 19260 

ggtccggtgg tggtgggttc ggtgaaggcg aatgtgggtc atgtgcaggc ggcggcgggt 19320 

gtggtgggtg tgatcaaggt ggtgttgggg ttgggtcggg ggttggtggg tccgatggtg 19380 

tgtcggggtg ggttgtcggg gttggtggat tggtcgtcgg gtgggttggt ggtggcggat 194 40 

ggggtgcggg ggtggccggt gggtgtggat ggggtgcgtc ggggtggggt gtcggcgttt 19500 

ggggtgtcgg ggacgaatgc tcatgtggtg gtggcggagg cgccggggtc ggtggtgggg 19560 

gcggaacggc cggtggaggg gtcgtcgcgg gggttggtgg gggtggttgg tggtgtggtg 19620 

ccggtggtgc tgtcggcaaa gaccgaaacc gccctgcacg cccaggcacg tcgactcgcc 19680 

gaccacctgg agacgcaccc cgacgtcccg atgaccgacg tggtgtggac gctgacgcag 19740 

gcccgccaac gcttcgacag gcgcgcggtc ctcctcgccg ccgaccggac ccaggccgtg 19800 

gaacggctgc gcggcctcgc cgggggcgaa ccggggaccg gtgtggtgtc gggggtggcg 19860 

tcgggtggtg gtgtggtgtt tgtttttcct ggtcagggtg gtcagtgggt ggggatggcg 19920 

cgggggttgt tgtcggttcc ggtgtttgtg gagtcggtgg tggagtgtga tgcggtggtg 19980 

tcgtcggtgg tggggttttc ggtgttgggg gtgttggagg gtcggtcggg tgcgccgtcg 20040 

ttggatcggg tggatgtggt gcagccggtg ttgttcgtgg tgatggtgtc gttggcgcgg 20100 

ttgtggcggt ggtgtggggt tgtgcctgcg gcggtggtgg gtcattcgca gggggagatc 20160 

gcggcggcgg tggtggcggg ggtgttgtcg gtgggtgatg gtgcgcgggt ggtggcgttg 20220 

cgggcgcggg cgttgcgggc gttggccggc cacggcggca tggcctcggt acgccgaggc 20280 

cgcgacgacg tacagaagct cctcgacagc ggcccctgga cggggaagct ggagatcgcc 20340 

gcggtcaacg gccccgacgc ggtggtggtc tccggcgacc cccgagccgt gaccgagctg 20400 

gtcgagcact gtgacgggat cggggtccgg gcccggacga tccccgtcga ctacgcctcc 204 60 

cactccgcac aggtcgagtc gctccgggag gagctgctct ccgtcctggc cgggatcgag 20520 

ggccgcccgg cgacggtgcc gttctactcc accctcaccg gtgggttcgt cgacggcacc 20580 

gaactggacg ccgactactg gtaccgcaac ctgcgccacc cggtgcggtt ccacgccgcc 20640 

gtcgaggcgc tggcagcgcg tgacctcacc acgttcgtcg aggtcagccc gcaccccgtg 20700 

ctgtcgatgg cggtcgggga gacgcttgcc gacgtggagt ccgccgtcac tgtgggcacc 20760 

ctggaacgcg acaccgacga cgtcgagcgc ttcctcacct ccctcgccga ggcgcacgtc 20820 

cacggcgtac ccgtggactg ggcggcggtc ctcggctccg gaaccctggt cgacctgccc 20880 

acctatccct tccagggacg gcggttctgg ctgcaccccg accgtggtcc gcgtgacgat 20940 

gtcgccgact ggttccaccg ggtcgactgg acggcgacgg ccaccgacgg gtcggcccga 21000 

ctcgacggtc gctggctggt ggtcgtaccc gaggggtaca cggacgacgg ctgggtcgtg 21060 

gaggtgcggg ccgccctcgc cgccggtggt gccgagccgg tggtgacgac ggtcgaggag 21120 

gtcaccgacc gggtcggtga cagcgacgcg gtggtgtcga tgctcgggct ggccgacgac 21180 

ggtgcggccg agaccctggc gctgctgcga cgactcgacg cacaggcgtc caccacccca 21240 

ctgtgggtgg tcaccgtggg ggccgtcgcc cccgccggtc cggtgcagcg ccccgaacag 21300 

gcgacggtgt gggggttggc ccttgtcgcc tccctggaac gcggacaccg gtggaccggc 21360 

ctgctggatc tgccgcagac accggacccg cagctacgac cccggctggt cgaggcgctc 21420 

gccggtgccg aggaccaggt agcggtccgc gccgacgccg tacacgcccg tcggatcgtc 21480 

cccaccccgg tcaccggagc cgggccgtac accgccccgg gcgggacgat cctcgtcacc 21540 

gggggcaccg ccggtctggg tgccgtcacc gcccgatggc tcgccgagcg cggtgccgaa 21600 

cacctcgccc tggtcagccg gcgcgggccg ggcaccgccg gcgtcgacga ggtggtccgg 21660 

gacctgaccg ggctcggcgt acgggtgtcg gtgcactcct gcgacgtcgg cgaccgcgag 21720 

tcggtcggcg ccctggtgca ggagttgaca gcagccggtg acgtggtccg gggggtggtc 21780 

cacgctgccg gtctgcccca gcaggtgcca ctgaccgaca tggacccggc cgacctcgcc 21840 

gacgtggtgg ccgtgaaggt cgacggcgcg gtgcacctgg ccgacctgtg cccggaggcc 21900 

gaactgttcc tgctgttctc ctccggggcc ggggtgtggg gcagtgcccg tcagggtgcg 21960 

tacgccgccg gaaacgcctt cctggacgcc ttcgcccgac accggcggga ccggggtctg 22020 

cccgccacct cggtggcgtg ggggctctgg gcggccgggg ggatgacagg ggaccaggag 22080 
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gcggtgtcgt tcctgcgtga gcggggcgta cggccgatgt cggtgccgag ggcactggaa 22140 

gcgctggaac gggtcctcac cgccggggag accgcggtgg tcgtcgccga cgtcgactgg 22200 

gcggccttcg ccgagtcgta cacctccgcc cggccccggc cgctgctcca ccggctcgtc 22260 

acacctgcgg cggcggtcgg cgagcgcgac gagccgcgtg agcagaccct ccgggaccgg 22320 

ctggcggccc tgccccgggc cgagcggtcg gcggagctgg tacgcctggt ccggcgggac 22380 

gccgcagccg tgctcggcag cgacgcgaag gccgtacccg ccaccacgcc gttcaaggac 22440 

ctcgggttcg actcgctggc cgcggtccgg ttccgtaacc ggctggccgc ccacaccggt 22500 

ctgcgtctgc cggccaccct ggtcttcgag cacccgaacg ccgcagccgt cgccgacctc 22560 

ctccacgacc gactcggcga ggccggcgag ccgacccccg tccggtcggt gggcgccgga 22620 

ctggccgcgc tggagcaggc cctgcccgac gcctccgaca cggagcgggt cgagctggtc 22680 

gagcgcctgg aacggatgct cgccgggctc cgccccgagg ccggagccgg ggccgacgcc 22740 

ccgaccgccg gtgacgacct gggggaggcc ggcgtcgacg aactcctcga cgcgctcgaa 22800 

cgggaactcg acgccaggtg aacccgaact gaccgcagcc gcagccgaag cagagaccga 22860 

ggacctgtga ctgacaacga caaggtggcg gagtacctcc gtcgtgcgac gctcgacctg 22920 

cgggccgccc gcaagcgcct gcgcgagctg caatccgacc cgatcgcggt cgtcggcatg 22980 

gcctgccgcc taccgggcgg ggtgcacctc ccgcagcacc tgtgggacct cctgcgccag 23040 

gggcacgaga cggtgtccac cttccccacc gggcgcggct gggacctggc cgggctcttc 23100 

cacccggacc ccgaccaccc cggcaccagc tacgtcgacc ggggtgggtt cctcgacgac 23160 

gtggcgggct tcgacgccga gttcttcggg atctccccgc gcgaggccac ggccatggac 23220 

ccgcaacagc ggctgctgtt ggagaccagt tgggagctgg tggagagcgc cggcatcgat 23280 

ccgcactccc tgcgtggcac cccgaccggc gtcttcctcg gcgtggcgcg gctcggctac 23340 

ggcgagaacg gcaccgaagc cggtgacgcc gagggctatt cggtgaccgg ggtggcaccc 23400 

gctgtcgcct ccgggcggat ctcctacgcc ctcgggctgg agggtccgtc gatcagcgtg 23460 

gacaccgcgt gctcgtcgtc gttggtggcg ctgcacctgg cggtcgagtc gctgcggctg 23520 

ggcgagtcga gtctcgctgt cgtcggcggg gcggcggtca tggcgacacc aggggtgttc 23580 

gtcgacttca gccgccagcg ggcgttggcc gctgacggca ggtcgaaggc cttcggggcc 23640 

gccgccgacg ggttcggctt ctccgagggg gtctccctcg tcctgctcga acggctctcc 23700 

gaggccgaaa gcaacggcca cgaggtgttg gctgtcatcc gtggctccgc cctcaaccag 23760 

gacggggcca gcaacggtct cgccgcgccg aacgggaccg cccagcgcaa ggtgatccgg 23820 

caggcgctac gaaactgcgg cctgaccccg gccgacgtgg acgccgtgga ggcgcacggc 23880 

accggcacca cgctcggcga cccgatcgag qccaacqccc tgctggacac ctacggccgt 23940 

gaccgggatc cggaccaccc gctgtggctg gggtcggtga agtcgaacat cggccacacg 24000 

caggcggcgg cgggcgtcac cgggctgctc aagatggtgc tggcactgcg ccacgaggaa 24060 

ctgcccgcca ccctgcacgt cgacgagccc accccgcacg tggactggtc ctcgggagcg 24 120 

gtacgcctgg cgacccgggg ccggccgtgg cggcggggtg accggccgag gcgggccggg 24180 

gtgtcggcgt tcggcatcag cgggaccaac gcccacgtga tcgtcgagga ggcacccgag 24240 

cggaccaccg agcgcaccgt cggcggcgac gtcggcccgg tcccgctcgt ggtgtccgcc 24300 

cggtcggcgg cggcgctacg ggcccaggcg gcccaggtcg ccgagctggt ggagggctcc 24360 

gacgtcgggc tggcggaggt cgggcggagc ctggccgtga cccgggcgcg acacgagcac 24 420 

cgggcggcgg tggtggcgtc gacccgggcc gaggcggtgc gggggctgcg cgaggtcgcg 24480 

gcggtcgaac cgcgcggcga ggacaccgtc accggggtcg ccgagacgtc cgggcgcacc 24540 

gtcgtcttcc tcttcccggg acaggggtcc. cagtgggtcg ggatgggcgc ggagctgctg 24600 

gactcggcac cggcgttcgc cgacacgatc cgcgcctgcg acgaggcgat ggcaccgttg 24660 

caggactggt cggtctccga cgtgctccgg caggagccgg gggcaccggg- actggaccgg 24720 

gtcgacgtgg tgcagccggt gctgttcgcg gtgatggtgt cgttggcgcg gttgtggcag 24780 

tcgtacgggg tcacccccgc tgcggtggtg gggcactcgc agggggagat cgccgccgcc 24840 

cacgtggcgg gtgcgctctc cctcgccgac gcggcgaggc tggtggtggg ccgcagccgg 24900 

ttgctgcggt cgctgtccgg gggcggcggc atgagcgccg tcgcgctcgg tgaggccgag 24960 

gtacgccgcc gactgcggtc gtgggaggac cggatctccg tggccgccgt caacggaccc 25020 

cggtcggtgg tggtggccgg ggaaccggag gcgctgcggg agtggggacg ggagcgggag 25080 

gccgagggcg tacgggtccg cgagatcgac gtcgactacg cctcgcactc gccgcagatc 25140 

gacagggtcc gtgacgaact cctgacggtc acgggggaga tcgagccccg gtcggcggag 25200 

atcaccttct actcgacggt cgacgtccgt gctgtcgacg gcaccgacct ggacgcgggg 25260 

tactggtacc gcaacctgcg ggagacggtc cggttcgccg acgcgatgac ccggttggcc 25320 

gactcgggat acgacgcgtt cgtcgaggtc agcccgcatc cggtggtggt gtcggcggtc 25380 

gccgaggcgg tcgaggaggc aggtgtcgag gacgccgtcg tcgtcggcac cctgtcccgg 25440 

ggcgacggcg gaccgggggc gttcctgcgg tcggcggcca ccgcccactg cgccggtgtg 25500 

gacgtcgact ggacgcccgc cctcccggga gctgcgacga tcccgttgcc gacgtacccg 25560 

ttccaacgga agccgtactg gctgcggtcg tctgctcccg cccccgcctc ccacgatctc 25620 

gcctaccggg tgtcctggac gccgatcacc ccgcccgggg acggcgtact cgacggcgac 2Sl680 ■ 

tggctggtgg tgcaccccgg gggcagcacc ggatgggtcg acgggttggc ggcggcgatc 25740 
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accgccggcg gtggccgggt cgtcgcccac ccggtggact ccgtgacctc ccggaccggc 25800 

ctggccgagg cgctcgcccg gcgggacggc acgttccggg gggtgctgtc gtgggtggcg 25860 

accgacgaac ggcacgtcga ggccggtgcg gtcgccctgc tgaccctggc gcaggcgttg 25920 

ggtgacgccg gaatcgacgc accactgtgg tgcctgaccc aggaggcggt ccgtaccccc 25980 

gtcgacggtg acctggcccg accggcgcag gccgccctgc acggtttcgc ccaggtcgcc 26040 

cggctggagc tggcccgccg cttcggtggg gtgctcgacc tgcccgccac cgtcgacgcc 26100 

gccgggacgc gtctggtcgc ggcggtcctc gccggcggcg gcgaggacgt cgtcgccgtc 26160 

cgtggcgacc gtctctacgg ccgtcgcctg gtcagggcga ccctgccgcc gcccggcggg 26220 

gggttcaccc cgcacggcac cgtcctggtc accggcgcgg ccggtccggt gggcggtcgg 26280 

ctggcccggt ggctcgccga acggggtgcc acccgactcg tcctgcccgg cgcacacccg 26340 

ggcgaggagt tgctgaccgc gatccgggcc gccggtgcca ccgccgtggt gtgcgaaccg 26400 

gaggcggagg cactgcgtac ggcgatcggc ggggagttgc cgaccgcgct cgtacacgcc 264 60 

gagacgttga cgaacttcgc cggcgtcgcc gacgccgacc ccgaggactt cgccgccacc 26520 

gtcgcggcga agaccgcgct gccgacggtc ctggcggagg tgctcggcga ccaccgcctc 26580 

gaacgggagg tctactgctc gtcggtggcc ggggtctggg gtggggtcgg catggccgcg 26640 

tacgccgccg gcagcgccta cctcgacgcc ctggtcgagc accgtcgcgc ccgggggcac 26700 

gccagcgcct cggtggcctg gaccccgtgg gccctgcccg gcgcggtcga cgacggtcgg 26760 

ctgcgcgagc gcggcctgcg cagcctcgac gtggccgacg ccctcgggac gtgggaacgt 26820 

ctgctccgcg ccggtgcggt gtcggtggcc gtcgccgacg tcgactggtc ggtcttcaca 26880 

gagggtttcg cggccatccg gccgaccccg ctettcgacg aactcctcga ccggcgcggg 2 6940 

gaccccgacg gcgcgcccgt cgaccggccg ggggagccgg cgggcgagtg gggtcgacga. 27000 

atcgcggcgc tgtccccgca ggaacagcgg gagacgttgc tgaccctcgt cggcgagacg 27060 

gtcgcggagg tgctgggaca cgagaccggc accgagatca acacccgtcg ggccttcagc 27120 

gaactcggcc tcgactcgct gggctcgatg gccctgcgtc agcgcctggc ggcccgtacc 27180 

ggcctgcgga tgccggcctc gctggtcttc gaccacccga cggtcaccgc gctcgcgcgg 27240 

tacctgcgtc gactggtcgt cggggactcc gacccgaccc cggtacgggt gttcggcccc 27300 

accgacgagg ccgaacccgt cgccgtggtc ggcatcggct gccggttccc. cggcggcatc 27360 

gccacccccg aggacctctg gcgggtggtg tccgagggca cctccatcac caccggattc 27 4 20 

cccaccgacc ggggctggga cctccggcgg ctctaccacc ccgacccgga ccaccccggc 27480 

accagctacg tcgacagggg gggattcctc gacggggccc cggacttcga ccccgggttc 27540 

ttcgggatca ccccccgcga ggcgctggcg atggacccgc agcagcggct caccctggag 27600 

atcgcgtggg aggcggtgga acgggcgggc atcgacccgg agaccctcct cggcagcgac 27660 

accggcgtct tcgtcggcat gaacggccag tcctacctgc aactgctgac cggggagggt 27720 

gaccggctca acggctacca ggggttgggc aactcggcga gcgtgctctc cggccgtgtc 27780 

gcctacacct tcgggtggga ggggccggcg ctgacggtgg acaccgcctg ctcgtcctcg 27840 

ctggtcgcca tccacctcgc catgcagtcg ctgcgtcggg gtgagtgctc gctggcgttg 27900 

gccggcgggg tgacggtcat ggccgacccg tacaccttcg tggacttcag cgcacagcgg 27 960 

gggctcgccg ccgacgggcg gtgcaaggcg ttctccgcgc aggccgacgg gttcgccctc 28020 

gccgagggcg tcgcggcgct cgtcctcgaa ccgttgtcca aggcgcggcg aaacggccac 28080 

caggtgctgg cggtgctgcg cggcagcgcc gtcaaccagg acggggccag caacggcctc 28140 

gccgccccga acgggccgtc gcaggaacgg gtgatcaggc aggccctgac cgcctccggg 28200 

ctgcgtcccg ccgacgtcga catggtggag gcgcacggga cgggcaccga actcggcgac 28260 

ccgatcgagg ccggggcgct catcgcggcg tacggccggg accgggaccg gccgctctgg 28320 

ctgggctcgg tgaagacgaa catcggccac acccaggccg ccgccggtgc cgccggggtg 28380 

atcaaggcgg tcctggcgat gcggcacggc gtactcccga ggtcgctgca cgccgacgag 28440 

ttgtccccgc acatcgactg ggcggacggg aaggtcgagg tgctccgcga ggcacgacag 28500 

tggccccccg gtgagcgccc ccgccgcgcc ggggtgtcct ccttcggcgt cagcgggacc 28560 

aacgcccacg tcatcgtcga ggaggcaccc gccgaaccgg accccgaacc ggttcccgcc 28620 

gccccgggcg ggcccctgcc cttcgtcctg cacggacgca gcgtccagac ggtccggtcc 28680 

caggcgcgga ccctcgccga acacctgcgc accaccggcc accgggacct cgccgacacc 28740 

gcccgtaccc tggccaccgg tcgcgcccgt ttcgacgtcc gggccgcagt gctcggcacc 28800 

gaccgggagg gtgtctgcgc cgccctcgac gcgctggcgc aggatcgccc ctcgcccgac 28860 

gtcgtcgccc cggcggtctt cgccgcccgt acccccgtcc tggtcttccc cgggcagggg 28920 

tcgcagtggg tcggcatggc ccgtgacctg ctcgactcct ccgaggtgtt cgccgagtcg 28980 

atgggccggt gcgccgaggc gctgtcgccg tacaccgact gggacctgct cgacgtggtc 29040 

cgtggggtcg gcgaccccga cccgtacgac cgggtggacg tgctccagcc ggtgctgttc 29100 

gcggtgatgg tgtcgctggc gcggttgtgg cagtcgtacg gggtgactcc gggtgcggtg 29160 

gtgggtcact cgcaggggga gatcgccgcc gcgcacgtgg ctggtgcgtt gtcgttggcc 29220 

gacgccgcca gggtggtggc gttgcgcagc cgggtgctgc gggagctcga cgaccagggc 29280 

ggcatgg-tgt cggtcggcac ctcccgcgcc gagttggact cggtcctgcg ccggtgggac 2^34 0 

gggcgggtcg cggtggcggc ggtgaacgga cccggcacgc tcgtggtggc cggacccacc 294 00 
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gccgaactgg acgagttcct cgcggtggcc gaggcccgcg agatgaggcc gcgtcggatc 29460 

gcggtgcgct acgcgtcgca ctccccggag gtggcccggg tcgaacagcg gctcgccgcc 29520 

gaactcggca ccgtcaccgc cgtcggcggc acggtcccgc tctactccac cgccaccggg 29580 

gacctcctcg acaccacagc catggacgcc gggtactggt accgcaacct gcgccaaccg 29640 

gtgctgttcg agcacgccgt ccgcagcctc ctggagcggg gattcgagac gttcatcgag 29700 

gtcagcccgc accctgtgct gctgatggcg gtcgaggaga ccgccgagga cgccgagcgc 29760 

ccggtcaccg gcgtgccgac gctgcgccgc gaccacgacg ggccgtcgga gttcctccgc 29820 

aacctcctgg gggcgcacgt gcacggggtc gacgtcgacc tgcgtccggc ggtcgcccac 29880 

ggccgcctgg tcgacctgcc cacctacccc ttcgacaggc agcggctctg gcccaagccg 29940 

caccgcaggg ccgacacctc gtcgctgggg gtccgtgact cgacccaccc gctgctgcac 30000 

gccgcagtcg acgtacccgg tcacggcgga gcggtgttca ccgggcggct ctcccccgac 30060 

gagcagcagt ggctgaccca gcacgtggtg ggtgggcgga acctggtgcc cggcagtgtc 30120 

ctggtcgacc tcgcgctcac cgccggggcc gacgtcggcg tgccggtgct ggaggaactc 30180 

gtcctgcagc agccgctggt gttgaccgcc gccggtgcgt tgctgcgcct gtcggtcggc 30240 

gccgccgacg aggacgggcg gcggccggtc gagatccacg ccgccgagga cgtctccgac 30300 

ccggccgagg cccggtggtc ggcgtacgcg accgggaccc tcgccgtcgg cgtggccggc 30360 

ggcggccggg acggcacaca gtggcccccg cccggcgcca ccgccctgac gttgaccgac 304 20 

cactacgaca ccctcgccga actgggctac gagtacgggc cggcgttcca ggcgctgcgc 30480 

gccgcgtggc agcacggcga cgtggtctac gcggaggtgt ccctcgacgc cgtcgaggag 30540 

gggtacgcgt tcgacccggt gctgctcgac gccgtcgccc agaccttcgg cctgaccagt 30600 

cgcgcccccg ggaagctccc cttcgcctgg cggggcgtca ccctgcacgc caccggggcc 30660 

actgcggtac gggtggtggc gacccccgcc ggaccggacg cggtggccct gcgggtcacc 30720 

gacccgaccg gtcagctcgt cgccacggtg gacgccctgg tcgtcaggga cgccggggcg 30780 

gatcgggacc agccgcgcgg ccgcgacggc gacctgcacc gcctggagtg ggtacggctg 30840 

gccaccccgg acccgacccc ggcggcggtg gtgcacgtgg cggccgacgg gctcgacgac 30900 

ctgctgcgcg ccggtggtcc ggcaccacag gccgtcgtcg tccgctaccg tcccgacggc 30960 

gacgacccga cggccgaggc ccgtcacggg gtgctctggg cggccacgct cgtgcgccgt 31020 

tggctcgacg acgaccggtg gcccgccacc accctggtgg tggccacgtc cgcaggggtc 31080 

gaggtctccc ccggggacga cgtgccgcgc cccggggccg ccgccgtgtg gggggtgctg 31140 

cgctgcgccc aggcggagtc cccggaccgc ttcgtgctcg tcgacggcga cccggagacg 31200 

cccccggcgg tgccggacaa tccgcagctc gcggtccgtg acggtgcggt gttcgtgcca 31260 

cggctgacgc "cgctcgccgg tcccgtgccg gccgtcgccg accgggcgta ccggctggtg 31320 

cccggcaacg gcggctccat cgaggcagtg gccttcgccc ccgtccccga cgccgaccgg 31380 

cccctggcgc cggaggaggt acgcgtcgcc gtccgcgcca ccggcgtgaa cttccgtgac 31440 

gtcctgctcg cgctcggcat gtacccggaa ccggccgaga tgggcaccga ggcgtccggt 31500 

gtggtcaccg aggtcgggtc gggtgtccgg cggttcaccc ccggccaggc ggtgacgggc 31560 

ctgttccagg gggccttcgg gccggtggcg gtcgccgacc accggctcct caccccggtc 31620 

cccgacgggt ggcgggcggt ggacgccgca gccgtaccca tcgcgttcac caccgcccac 31680 

tacgcgctgc acgacctggc cgggttgcag gccgggcagt ccgtgctggt ccacgccgcc 31740 

gccggcgggg tggggatggc tgccgtcgcg ttggcccgtc gggccggggc ggaggtgttc 31800 

gccacggcca gcccggccaa acacccgacg ctgcgggcgc tcggcctcga cgacgaccac 318 60 

atcgcctcgt cccgggagag cgggttcggt gagcggttcg ccgcgcgtac cggggggcgg 31920 

ggcgtcgacg tggtcctgaa ctcgctcacc ggcgacctgc tcgacgagtc cgcgcggctg 31980 

ctcgccgacg gcggggtctt cgtcgagatg ggcaagaccg acctgcggcc. ggcggagcag 32040 

ttccggggcc ggtacgtccc gttcgacctg gccgaggccg gtcccgatcg gctcggcgag 32100 

atcctggagg aggtcgtcgg tctgctggcc gccggtgccc tcgaccggtt gccggtgtcg 32160 

gtgtgggagt tgtcggcggc cccggccgcg ctcacccaca tgagccgggg ccgacacgtg 32220 

ggcaagctcg tcctcaccca gcccgccccc gtgcaccccg acggaacggt gctggtcacc 32280 

ggcgggaccg gcaccctggg gcggctggtc gcccgccacc tggtgaccgg gcacggcgta 32340 

ccccacctcc tggtggccag ccggcgcggt ccggcggccc cgggcgcggc cgagctgcgc 324 00 

gccgacgtcg aaggcctcgg cgcgaccatc gagatcgtcg cctgcgacac cgccgaccgg 324 60 

gaggcgctcg cggcgctgct cgactcgatc cccgcggacc gtccgctgac cggggtggtg 32520 

cacaccgccg gggtcctggc cgacgggctg gtcacctcca tcgacgggac cgccaccgat 32580 

caggtcctgc gggccaaggt cgacgcggcg tggcacctgc acgacctgac ccgggacgcg 32640 

gacctgagct tcttcgtgct gttctcgtcg gcggcgtcgg tgctggccgg tcccgggcag 32700 

ggcgtgtacg cggcggccaa cggggtcctc aacgccctgg ccgggcaacg gcgggccctc 327 60 

ggactgcccg cgaaggcgct cgggtggggc ctgtgggcgc aggccagcga gatgaccagc 32820 

ggcctcggtg accggatcgc ccgtaccggg gtcgccgcgc tgccgaccga gcgggcgctg 32880 

gccctgttcg acgcggctct gcgcagcggc ggggaggtgc tgttcccgct gtctgtcgac 32940 

aggtcggcgc tgcgccgggc cgagtacgtc cccgaggtgc tgcgcggcgc ggtccggtcc -33000- 

acgccacggg ccgccaacag ggccgagacc ccgggccggg gcctgctcga ccgtctcgtc 33060 
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ggtgcacccg 
gcggtcgccg 
gggttcgact 
cggctgccca 
cggtcggagt 
ctggaacggg 
ctggaggcgc 
atcagtgacg 
ggaggggacg 
acaggtccac 
atccgatgag 
ccgtcgccga 
aaccgatcgc 
cgttctggga 
gctggccgcc 
acgccgcctt 
tgatgctgga 
gcggcagcgc 
acgaggcacc 
ccggacgggt 
gctcctccgg 
ccctggtcct 
gcagccaggg 
gcttcgggct 
ccgagggccg 
gcaacgggct 
agcgggcgcg 
ggctgggcga 
ccggccgccc 
cgggggtggc 
cgttgcactt 
tgtccgagac 
tcggcatcag 
ccgacctcga 
ccaccgccga 
ccctgcgcgc 
tgcgcgacac 
tcgtcggcgg 
tcgacggagc 
ggcagggcgc 
cggagtccat 
aggtgctcga 
cggtgatggt 
tgggtcactc 
acgccgccag 
ggatggcgtc 
gtgcgctgac 
gcccgttgga 
ccgtcgacta 
cactggccgg 
aggtcatcga 
tgcgcttcca 
tcagcccgca 
ccgacgcgga 
tccacaccgc 
tgggtgaggg 
tcccggtccc 
ggcaccccgt 
cggcagtacc 
ccgtcgtgtt 
acggcaccgc 



agaccgatca 
gctacgactc 
cgctggcggc 
gcacgctggt 
tgttcgccga 
cgctcgacgc 
tgctgcgccg 
acgccagtga 
tctaggtgac 
cgggttcgcg 
cgagagcagc 
actcgactcg 
cgtcgtcggc 
gttcatccgc 
ggcaccgcga 
cttcggcatc 
gatctcctgg 
cggtggcgtc 
cgaggaggtg 
ggcgtacacc 
gctcaccgcg 
cgccggtggg 
cgggttggcc 
cgccgagggg 
gccggtgctg 
caccgcgccg 
gctgcgtccc 
tccgatcgag 
gctctgggtc 
cggggtgatg 
cgacgagccc 
'ccggccctgg 
cggcaccaac 
cccgaccccc 
gccgggtgcg 
ccaggcggcc 
cgccttcacc 
gggcgaggag 
cgtcagcggg 
acagtggcag 
cgacgcctgc 
cggcgagcag 
gtcgttggcg 
gcagggggag 
ggtggtggcg 
gttcgggctc 
tgtcgcctcg 
cgagctgatc 
cgcctcacac 
ggtccgtccg 
aacggcgacg 
ggacgccacc 
cccggtgttg 
tccgtgtgtc 
gctcgccgag 
acgcccggtc 
cctgggccgg 
cgacctcggg 
cccggcctgg 
gtgcaccgcg 
cctgtccact 



ggtggccgcg 
ggccgaccag 
ggtggagctg 
gttcgaccac 
ctccgcgccg 
cctgcccgac 
gtggcagagc 
cgacgagctg 
aggtcgattc 
tcgcctccca 
ggcatgaccg 
gtgacaggtc 
atggcctgcc 
gacggtggtg 
ccccgcctcg 
tcaccccgcg 
gaggcgt tgg 
ttcaccggtg 
ctcggctacg 
ctggggttgg 
gtgcacctgg 
gtcaccgtga 
gaggacggcc 
gccggggtcc 
gccgtactgc 
agcggccccg 
gtcgacgtgg 
gcgcacgccc 
ggatcggtga 
aagaccgtgc 
tcgccgcacg 
ccggtggggg 
gcgcacgt ca 
ggcccggcaa 
gaggcggtcg 
cggctcgccg 
ctggtcaccc 
gtcctcgccg 
cgggcgcgcg 
ggcatggccc 
gagcgggcgc 
tcgt tggacc 
cggttgtggc 
atcgccgccg 
ttgcgcagcc 
caccccgacc 
gtcaacggtc 
gccgagtgcg 
tccccgcagg 
gtgtcggccg 
atggacgccg 
aggcagctcg 
acagtcggtg 
acaggcaccc 
gcgtacaccc 
gacctgccgg 
gtccccgaca 
cggtcctccc 
acggacgtgg 
cagtcgcgcg 
gtggtctctc 



ctggccgagc 
ctgcccgaac 
cgcaaccggc 
ccgacaccgc 
gacgtcgggg 
gcgcagggac 
cgacgacccc 
ttctcgatgc 
cgceccgcgg 
cacccgacgg 
aggaccgcct 
ggctcgacga 
ggttccccgg 
acgcgatcgc 
gtggtctcct 
aggcgctcgc 
agcgtgcggg 
tcggtgcggt 
tcggcatcgg 
agggtccagc 
cgatggagtc 
tgagcagccc 
gctgcaaacc 
tggtgctcca 
gtggctcggc 
cccagcggcg 
actacgtgga 
tgctcgacac 
agtccaacat 
tggcgctgcg 
tcgactggga 
agcgcccgcg 
t cgtcgagga 
ccggagcgac 
cactggtgtt 
accgtctcac 
gccgtgccac 
gcctccgggc 
ccggccgccg 
gggacctgct 
tcgccccgca 
ccgtcgacgt 
agtcgtacgg 
cgcacgtggc 
gggtgctgcg 
aggccgccga 
cccgttcggt 
aggccgaggg 
tggagtcgct 
ggatccccct 
actactggtt 
ccgaggcggg 
tcgaggccac 
tgcgccgcga 

ggggggtgga 

tctacccgtt 
ccggcgacga 
tggccggacg 
tccgcgacgg 
cccggatcgg 
tgctcgcgct 



tggtccgctc 
gcaaggcgtt 
tcggcgtcac 
tggcggtggc 
tcggtgcgcg 
acgccgacgt 
cggagaccga 
tcgacaggcg 
cagtggaccg 
ccggggtatc 
ccggcgctat 
ggtcgagtac 
gggtgtggac 
cgaggcgccc 
cgcggagccg 
gacggacccc 
tttcgacccg 
ggactacgga 
caccgcctcc 
cgtcaccgtc 
gctgcgccgc 
gggtgcgtrtc 
gttctcccgc 
acggctgtcc 
gatcaaccag 
ggtgatcagg 
ggcccacggc 
gtacggtgcc 
cggtcacacc 
gcatcgggag 
ccggggtgcg 
ccgggcgggg 
ggcgccgagc 
ccccggaacg 
ctccgcgcgc 
cgacgacccg 
ctgggagcat 
cgtcgccggg 
ggtggtgctg 
gcggcagtcg 
cgtggactgg 
ggtgcagccg 
ggtgactccg 
tggtgcgttg. 
ccgtctcggt 
gcggatcgcg 
ggtgctggcc 
cgtgaccgcc 
gcgt gaggag 
gtactcgacc 
cgccaacctc 
gttcgacgcc 
cctcgaggca 
acgcggcggt 
ggtcgactgg 
ccaacgacag 
gtggcgttac 
ggtcctggtg 
cctggaacag 
cgccgcactc 
cgccgagggc 



gcacgcggcg 
caaggacctc 
caccggcgta 
cgaacacctg 
cctcgacgac 
cggggcccgc 
gccagtgacg 
tctcggcggg 
taccgccctg 
cacggaaggg 
ctcaagcgca 
cgggcccgcg 
tcgccggagg 
acggaccgtg 
ggcgcgttcg 
cagcagcgcc 
tcgagcctgc 
cccaggccgg 
agcgtcgcct 
gacaccgcct 
gacgagtgca 
accgagttcc 
gccgccgacg 
gtcgcccggg 
gacggtgcca 
caggcgttgg 
accggcaccc 
gaccgggaac 
caggcggcgg 
a tcccggcga 
gtgtcggtgg 
gtgtcctcgt 
ccgcaggcgg 
gatgccgccc 
gacgagcggg 
gccccctcgt 
cgggcggtcg 
ggacgtcccg 
gtcttccccg 
ccgaccttcg 
tcgctgcgcg 
gtgctgttcg 
ggtgcggtgg 
tcgttggccg 
ggtcacggcg 
cgcttcgcgg 
ggggagaacg 
cgtcggatcc 
ctgctcgccg 
ctgaccggtc 
cgggagccgg 
ttcgtcgagg 
gtgctgcccc 
ctcgcgcagt 
cgtaccgcag 
aacttctggc 
cagctcgcct 
gtgaccggag 
cgcggggcga 
gacgccgtcg 
ggtgctgtcg 



33120 

33180 

33240 

33300 

33360 

33420 

33480 

33540 

33600 

33660 

33720 

33780 

33840 

33900 

33960 

34020 

34080 

34140 

34200 

34260 

34320 

34380 

34440 

34500 

34560 

34620 

34680 

34740 

34800 

34860 

34920 

34980 

35040 

35100 

35160 

35220 

35280 

35340 

35400 

35460 

35520 

35580 

35640 

35700 

35760 

35820 

35880 

35940 

36000 

36060 

36120 

36180 

36240 

36300 

36360 

36420 

36480 

36540 

36600 

36660 

36720 



14 



BNSDOCID: <WO 0127284A2 I > 



WO 01/27284 



PCT/US00/27433 



acgaccccag cctggacacc ctcgcgttgg tccaggcgct cggcgcagcc gggatcgacg 36780 

tccccctgtg gctggtgacc agggacgccg ccgccgtgac cgtcggagac gacgtcgatc 36840 

cggcccaggc catggtcggt gggctcggcc gggtggtggg cgtggagtcc cccgcccggt 36900 

ggggtggcct ggtggacctg cgcgaggccg acgccgactc ggcccggtcg ctggccgcca 36960 

tactggccga cccgcgcggc gaggagcagt tcgcgatccg gcccgacggc gtcaccgtcg 37020 

cccgtctcgt cccggcaccg gcccgcgcgg cgggtacccg gtggacgccg cgcgggaccg 37080 

tcctggtcac cggcggcacc ggcggcatcg gcgcgcacct ggcccgctgg ctcgccggtg 37140 

; cgggcgccga gcacctggtg ctgctcaaca ggcggggagc ggaggcggcc ggtgccgccg 37200 

acctgcgtga cgaactggtc gcgctcggca cgggagtcac catcacggcc tgcgacgtcg 37260 

ccgaccgcga ccggttggcg gccgtcctcg acgccgcacg ggcgcaggga cgggtggtca 37320 

cggcggtgtt ccacgccgcc gggatctccc ggtccacagc ggtacaggag ctgaccgaga 37380 

gcgagttcac cgagatcacc gacgcgaagg tgcggggtac ggcgaacctg gccgaactct 374 40 

gtcccgagct ggacgccctc gtgctgttct cctcgaacgc ggcggtgtgg ggcagcccgg 37500 

ggctggcctc ctacgcggcg ggcaacgcct tcctcgacgc cttcgcccgt cgtggtcggc 37560 

gcagtgggct gccggtcacc tcgatcgcct ggggtctgtg ggccgggcag aacatggccg 37620 

gtaccgaggg cggcgactac ctgcgcagcc agggcctgcg cgccatggac ccgcagcggg 37680 

cgatcgagga gctgcggacc accctggacg ccggggaccc gtgggtgtcg gtggtggacc 3*^74 0 

tggaccggga gcggttcgtc gaactgttca ccgccgcccg ccgccggccc ctcttcgacg .37800 

aactcggtgg ggtccgcgcc ggggccgagg agaccggtca ggaatcggat ctcgcccggc 37860 

• ggctggcgtc gatgccggag gccgaacgtc acgagcatgt cgcccggctg gtccgagccg 37920 

aggtggcagc ggtgctgggc cacggcacgc cgacggtgat cgagcgtgac gtcgccttcc 37980 

gtgacctggg attcgactcc atgaccgccg tcgacctgcg gaaccggctc gcggcggtga 38040 

ccggggtccg ggtggccacg - accatcgtct tcgaccaccc gacagtggac cgcctcaccg 38100 

cgcactacct ggaacgactc gtcggtgagc cggaggcgac gaccccggct gcggcggtcg 38160 

tcccgcaggc acccggggag gccgacgagc cgatcgcgat cgtcgggatg gcctgccgcc 38220 

tcgccggtgg agtgcgtacc cccgaccagt tgtgggactt catcgtcgcc gacggcgacg 38280 

cggtcaccga gatgccgtcg gaccggtcct gggacctcga cgcgctgttc gacccggacc 38340 

ccgagcggca cggcaccagc tactcccggc acggcgcgtt cctggacggg gcggccgact 38400 

tcgacgcggc gttcttcggg atctcgccgc gtgaggcgtt ggcgatggat ccgcagcagc 38460 

ggcaggtcct ggagacgacg tgggagctgt tcgagaacgc cggcatcgac ccgcactccc 38520 

tgcgcggtac ggacaccggt gtcttcctcg gcgctgcgta ccaggggtac ggccagaacg 38580 

cgcaggtgcc gaaggagagt gagggttacc tgctcaccgg tggttcctcg gcggtcgcct 38640 

ccggtcggat cgcgtacgtg ttggggttgg aggggccggc gatcactgtg gacacggcgt 38700 

gttcgtcgtc gcttgtggcg ttgcacgtgg cggccgggtc gctgcgatcg ggtgactgtg 38760 

ggctcgcggt ggcgggtggg gtgtcggtga tggccggtcc ggaggtgttc accgagttct 38820 

ccaggcaggg cgcgctggcc cccgacggtc ggtgcaagcc cttctccgac caggccgacg 38880 

ggttcggatt cgccgagggc gtcgctgtgg tgctcctgca gcggttgtcg gtggcggtgc 38940 

gggaggggcg tcgggtgttg ggtgtggtgg tgggttcggc ggtgaatcag gatggggcga 39000 

gtaatgggtt ggcggcgccg tcgggggtgg cgcagcagcg ggtgattcgg cgggcgtggg 39060 

gtcgtgcggg tgtgtcgggt ggggatgtgg gtgtggtgga ggcgcatggg acggggacgc 39120 

ggttggggga tccggtggag ttgggggcgt tgttggggac gtatggggtg ggtcggggtg 39180 

gggtgggtcc ggtggtggtg ggttcggtga aggcgaatgt gggtcatgtg caggcggcgg 39240 

cgggtgtggt gggtgtgatc aaggtggtgt tggggttggg tcgggggttg gtgggtccga 39300 

tggtgtgtcg gggtgggttg tcggggttgg tggattggtc gtcgggtggg. ttggtggtgg 39360 

cggatggggt gcgggggtgg ccggtgggtg tggatggggt gcgtcggggt ggggtgtcgg 39420 

cgtttggggt gtcggggacg aatgctcatg tggtggtggc ggaggcgccg gggtcggtgg 39480 

tgggggcgga acggccggtg gaggggtcgt cgcgggggtt ggtgggggtg gctggtggtg 39540 

tggtgccggt ggtgctgtcg gcaaagaccg aaaccgccct gaccgagctc gcccgacgac 39600 

tgcacgacgc cgtcgacgac accgtcgccc tcccggcggt ggccgccacc ctcgccaccg 39660 

gacgcgccca cctgccctac cgggccgccc tgctggcccg cgaccacgac gaactgcgcg 39720 

acaggctgcg ggcgttcacc actggttcgg cggctcccgg tgtggtgtcg ggggtggcgt 39780 

cgggtggtgg tgtggtgttt gtttttcctg gtcagggtgg tcagtgggtg gggatggcgc 39840 

gggggttgtt gtcggttccg gtgtttgtgg agtcggtggt ggagtgtgat gcggtggtgt 39900 

cgtcggtggt ggggttttcg gtgttggggg tgttggaggg tcggtcgggt gcgccgtcgt 39960 

tggatcgggt ggatgtggtg cagccggtgt tgttcgtggt gatggtgtcg ttggcgcggt 40020 

tgtggcggtg gtgtggggtt gtgcctgcgg cggtggtggg tcattcgcag ggggagatcg 40080 

cggcggcggt ggtggcgggg gtgttgtcgg tgggtgatgg tgcgcgggtg gtggcgttgc 40140 

gggcgcgggc gttgcgggcg ttggccggcc acggcggcat ggtctccctc gcggtctccg 40200 

ccgaacgcgc ccgggagctg atcgcaccct ggtccgaccg gatctcggtg gcggcggtca 40260 

actccccgac ctcggtggtg gtctcgggtg acccacaggc cctcgccgcc ctcgtcgccc A 9320- 

actgcgccga gaccggtgag cgggccaaga cgctgcctgt ggactacgcc tcccactccg 40380 
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cccacgtcga acagatccgc gacacgatcc tcaccgacct ggccgacgtc acggcgcgcc 404 40 

gacccgacgt cgccctctac tccacgctgc acggcgcccg gggcgccggc acggacatgg 40500 

acgcccggta ctggtacgac aacctgcgct caccggtgcg cttcgacgag gccgtcgagg 40560 

ccgccgtcgc cgacggctac cgggtcttcg tcgagatgag cccacacccg gtcctcaccg 40620 

ccgcggtgca ggagatcgac gacgagacgg tggccatcgg ctcgctgcac cgggacaccg 4 0680 

gcgagcggca cctggtcgcc gaactcgccc gggcccacgt gcacggcgta ccagtggact 40740 

qgcgggcgat cctccccgcc acccacccgg ttcccctgcc gaactacccg ttcgaggcga 40800 

cccggtactg gctcgccccg acggcggccg accaggtcgc cgaccaccgc taccgcgtcg 40860 

actggcggcc cctggccacc accccggcgg agctgtccgg cagctacctc gtcttcggcg 40920 

acgccccgga gaccctcggc cacagcgtcg agaaggccgg cgggctcctc gtcccggtgg 4 0980 

ccgctcccga ccgggagtcc ctcgcggtcg ccctggacga ggcggccgga cgactcgccg 41040 

gtgtgctctc cttcgccgcc gacaccgcca cccacctggc ccggcaccga ctcctcggcg 41100 

aggccgacgt cgaggcccca ctctggctgg tcaccagcgg cggcgtcgca ctcgacgacc 41160 

acgacccgat cgactgcgac caggcaatgg tgtgggggat cggacgggtg atgggtctgg 4 1220 

agaccccgca ccggtggggc ggcctggtgg acgtgaccgt cgaacccacc gccgaggacg 41280 

gggtggtctt cgccgccctc ctggccgccg acgaccacga ggaccaggtg gcgctgcgcg 41340 

acggcatccg ccacggccga cggctcgtcc gcgccccgct gaccacccga aacgccaggt 41400 

ggacaccggc gggcacggcg ctcgtcacgg gcggtacggg tgccctcggc ggccacgtcg 41460 

cgcggtacct ggcccggtcc ggggtgaccg atctcgtcct gctcagcagg agcggccccg 41520 

acgcacccgg tgccgccgaa ctggccgccg aactggccga cctcggggcc gagccgagag 41580 

tcgaggcgtg cgacgtcacc gacgggccac gcctgcgcgc cctggtgcag gagctacggg 41640 

aacaggaccg gccggtccgg atcgtcgtcc acaccgcagg ggtgcccgac tcccgtcccc 41700 

tcgaccggat cgacgaactg gagtcggtca gcgccgcgaa ggtgaccggg gcgcggctgc 417 60 

tcgacgagct ctgcccggac gccgacacct tcgtcctgtt ctcctcgggg gcgggagtgt 41820 

ggggtagcgc gaacctgggc gcgtacgcgg cagccaacgc ctacctggac gccctggccc 41880 

accgccgccg ccaggcgggc cgggccgcga cctcggtcgc ctggggggcg tgggccggcg 41940 

acggcatggc caccggcgac ctcgacgggc tgacccggcg cggtctgcgg gcgatggcac 42000 

cggaccgggc gctgcgcgcc tgcaccaggc gttggaccac ccacgacacc tgtgtgtcgg 4 2060 

tagccgacgt cgactgggac cgcttcgccg tgggtttcac cgccgcccgg cccagacccc 42120 

tgatcgacga actcgtcacc tccgcgccgg tggccgcccc caccgctgcg gcggccccgg 42180 

tcccggcgat gaccgccgac cagctactcc agttcacgcg ctcgcacgtg gccgcgatcc 42240 

tcggtcacca ggacccggac gcggtcgggt tggaccagcc cttcaccgag ctgggcttcg 42300 

actcgctcac cgccgtcggc ctgcgcaacc agctccagca ggccaccggg cggacgctgc 42360 

ccgccgccct ggtgttccag caccccacgg tacgcagact cgccgaccac ctcgcgcagc 42420 

agctcgacgt cggcaccgcc ccggtcgagg cgacgggcag cgtcctgcgg gacggctacc 42480 

ggcgggccgg gcagaccggc gacgtccggt cgtacctgga cctgctggcg aacctgtcgg 42540 

agttccggga gcggttcacc gacgcggcga gcctgggcgg acagctggaa ctcgtcgacc 42600 

tggccgacgg atccggcccg gtcactgtga tctgttgcgc gggcactgcg gcgctctccg 42660 

ggccgcacga gttcgcccga ctcgcctcgg cgctgcgcgg caccgtgccg gtgcgcgccc 42720 

tcgcgcaacc cgggtacgag gcgggtgaac cggtgccggc gtcgatggag gcagtgctcg 42780 

gggtgcaggc ggacgcggtc ctcgcggcac agggcgacac gccgttcgtg ctggtcggac 42840 

actcggcggg ggccctgatg gcgtacgccc tggcgaccga gctggccgac cggggccacc 42900 

cgccacgtgg cgtcgtgctc ctcgacgtgt acccacccgg tcaccaggag gcggtgcacg 42960 

cctggctcgg cgagctgacc gccgccctgt tcgaccacga gaccgtacgg atggacgaca 43020 

cccggctcac ggccctgggg gcgtacgaca ggctgaccgg caggtggcgt' ccgagggaca 43080 

ccggtctgcc cacgctggtg gtggccgcca gcgagccgat gggggagtgg ccggacgacg 4 3140 

gttggcagtc cacgtggccg ttcgggcacg acagggtcac ggtgcccggt gaccacttct 4 3200 

cgatggtgca ggagcacgcc gacgcgatcg cgcggcacat cgacgcctgg ttgagcgggg 43260 

agagggcatg aacacgaccg atcgcgccgt gctgggccga cgactccaga tgatccgggg 43320 

actgtactgg ggttacggca gcaacggaga cccgtacccg atgctgttgt gcgggcacga 43380 

cgacgacccg caccgctggt accgggggct gggcggatcc ggggtccggc gcagccgtac 43440 

cgagacgtgg gtggtgaccg accacgccac cgccgtgcgg gtgctcgacg acccgacctt 43500 

cacccgggcc accggccgga cgccggagtg gatgcgggcc gcgggcgccc cggcctcgac 4 3560 

ctgggcgcag ccgttccgtg acgtgcacgc cgcgtcctgg gacgccgaac tgcccgaccc 4 3620 

gcaggaggtg gaggaccggc tgacgggtct cctgcctgcc ccggggaccc gcctggacct 43680 

ggtccgcgac ctcgcctggc cgatggcgtc gcggggggtc ggcgcggacg accccgacgt 43740 

gctgcgcgcc gcgtgggacg cccgggtcgg cctcgacgcc cagctcaccc cgcagcccct 4 3800 

ggcggtgacc gaggcggcga tcgccgcggt gcccggggac ccgcaccggc gggcgctgtt 4 3860 

caccgccgtc gagatgacag ccaccgcgtt cgtcgacgcg gtgctggcgg tgaccgccac 43920 

ggcgggggcg gcccagcgtc tcgccgacga ccccgacgtc gccgcccgtc tcgtcgcgga 49980 

ggtgctgcgc ctgcatccga cggcgcacct ggaacggcgt accgccggca ccgagacggt 4 4 040 
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ccgccgccaa 


44100 


ccgacgccga 


44160 


tggtggtcct 


44220 


ccggtggccc 


44280 


cggtcgaact 


44340 


cacctgttcg 


44400 


gtcgtcgcct 


44460 


gtcggcaccg 


44520 


tacgtccgca 


44580 


ctcggcatgc 


44640 


gtcgagggca 


44700 


accttcgccg 


44760 


ggacccgaca 


44820 


gccgcccacc 


44880 


ggccgggtgc 


44940 


ccggtcggga 


45000 


aacggcccgt 


45060 


ctcaccctgg 


45120 


ttgggtgcgc 


45180 


gaaggcgtcg 


45240 


ctgctgccga 


45300 


gccatccacg 


45360 


cagcggaccg 


45420 


cagctccgcg 


45480 


cggatgcggg 


45540 


gggctggtcg 


45600 


gccggtgcgc 


45660 


acgacgccga 


45720 


acaccgccga 


45780 


ggtggttggc 


45840 


tcccaccggg 


45900 


gtgagggctc 


45960 


tggaccgggt 


46020 


gaaaggtctg 


46080 


agcacgccgt 


46140 


cgtcgcgcca 


46200 


tcgccaggcc 


46260 


cacgggcgtc 


46320 


gcagagacct 


46380 


gtgtggcggg 


46440 


cctgcggcgt 


46500 


tcgccgcagc 


46560 


gggaacccgt 


46620 


gggtgagccg 


46680 


ggtagaagtg 


46740 


agcgttgggc 


46800 


cggcccgcag 


46860 


gcagcaccgg 


46920 


cgagcctgcg 


46980 


ggacccgggg 


47040 


cctccaggcg 


47100 


cgaacggaac 


47160 


cggcggtgcc 


47220 


acacgtcgac 


47280 


ccgcgtgcgg 


47340 


accaggtgtt 


47400 


agcaggagcg 


47460 


aagcggtcga 


47520 


atgeagtagt 


47580 


aaccggtcgg 


47,640 - 


acggtgctgt 


47700 
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acgccgggat cgtcaccccg ccgatctcca cctcggcggt ggcgaaccgg gtggtggtct 4 77 60 

ccggtggggc ctggtagcgc aggatctcct ccaccgctcc gggcagcagt gccgggtcct 47820 

tccggaccag cgcgagctgg tcggggtggg tcagcagcag gtaggtgccg atcccgatga 4 7880 

ggctcaccga cgcctcgaat cccgccagca gcagcaccag cgcgatggag gtgagttcgt 47940 

cgcggctgag ccggtcggcg tcgtcgtcct ggacccggat c 47981 

<210> 2 
<211> 48 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 2 

Met Gly Asp Arg Val Asn Gly His Ala Thr Pro Glu Ser Thr Gin Ser 

I 5 10 15 

Ala He Arg Phe Leu Thr Arg His Gly Gly Pro Pro Thr Ala Thr Asp 

20 25 30 

Asp Val His Asp Trp Leu Ala His Arg Ala Ala Glu His Arg Leu Glu 
35 40 45 

<210> 3 
<211> 377 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 3 

Met Ala Val Gly Asp Arg Arg Arg Leu Gly Arg Glu Leu Gin Met Ala 

1 5 10 15 . 

Arg Gly Leu Tyr Trp Gly Phe Gly Ala Asn Gly Asp Leu Tyr Ser Met 

20 25 30 

Leu Leu Ser Gly Arg Asp Asp Asp Pro Trp Thr Trp Tyr Glu Arg Leu 

35 40 45 

Arg Ala Ala Gly Arg Gly Pro Tyr Ala Ser Arg Ala Gly Thr Trp Val 

50 55 * 60 

Val Gly Asp His Arg Thr Ala Ala Glu Val Leu Ala Asp Pro Gly Phe 
65 70 75 80 

Thr His Gly Pro Pro Asp Ala Ala Arg Trp Met Gin Val Ala His Cys 

85 90 95 

Pro Ala Ala Ser Trp Ala Gly Pro Phe Arg Glu Phe Tyr Ala Arg Thr 

100 105 HO 

Glu Asp Ala Ala Ser Val Thr Val Asp Ala Asp Trp Leu Gin Gin Arg 

115 120 125 

Cys Ala Arg Leu Val Thr Glu Leu Gly Ser Arg Phe Asp Leu Val Asn 

130 135 140 

Asp Phe Ala Arg Glu Val Pro Val Leu Ala Leu Gly Thr Ala Pro Ala 
145 150 155 160 

Leu Lys Gly Val Asp Pro Asp Arg Leu Arg Ser Trp Thr Ser Ala Thr 

165 170 175 

Arg Val Cys Leu Asp Ala Gin Val Ser Pro Gin Gin Leu Ala Val Thr 

180 185 190 

Glu Gin Ala Leu Thr Ala Leu Asp Glu lie Asp Ala Val Thr Gly Gly 

195 200 205 

Arg Asp Ala Ala Val Leu Val Gly Val Val Ala Glu Leu Ala Ala Asn 

210 215 220 

Thr Val Gly Asn Ala Val Leu Ala Val Thr Glu Leu Pro Glu Leu Ala 
225 230 235 240 

Ala Arg Leu Ala Asp Asp Pro Glu Thr Ala Thr Arg Val Val Thr Glu 

245 250 255 

Val Ser Arg Thr Ser Pro Gly Val His Leu Glu Arg Arg Thr Ala Ala 

260 265 270 

Ser Asp. Arg Arg Val Gly Gly Val Asp Val Pro Thr Gly Gly Glu Val 
275 280 285 
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Thr Val Val Val Ala Ala Ala Asn Arg Asp Pro Glu Val Phe Thr Aso 

290 295 300 

Pro Asp Arg Phe Asp Val Asp Arg Gly Gly Asp Ala Glu lie Leu Ser 
305 310 315 320 

Ser Arg Pro Gly Ser Pro Arg Thr Asp Leu Asp Ala Leu Val Ala Thr 

325 330 335 

Leu Ala Thr Ala Ala Leu Arg Ala Ala Ala Pro Val Leu Pro Arg Leu 

340 345 350 

Ser Arg Ser Gly Pro Val He Arg Arg Arg Arg Ser Pro Val Ala Arq 

355 360 365 

Gly Leu Ser Arg Cys Pro Val Glu Leu 
370 375 

<210> 4 
<211> 436 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 4 

Met Arg Val Val Phe Ser Ser Met Ala Val Asn Ser His Leu Phe Glv 

1 5 . 10 15 

Leu Val Pro Leu Ala Ser Ala Phe Gin Ala Ala Gly His Glu Val Arq 

20 25 30 

Val Val Ala Ser Pro Ala Leu Thr Asp Asp Val Thr Gly Ala Gly Leu 

35 40 45 

Thr Ala Val Pro Val Gly Asp Asp Val Glu Leu Val Glu Trp His Ala 

50 55 60 

His Ala Gly Gin Asp He Val Glu Tyr Met Arg Thr Leu Asp Trp Val 
65 ™ 75 ^ 80 

Asp Gin Ser His Thr Thr Met Ser Trp Asp Asp Leu Leu Gly Met Gin 

85 90 95 

Thr Thr Phe' Thr Pro Thr Phe Phe Ala Leu Met Ser Pro Asp Ser Leu 

100 105 no 

He Asp Gly Met Val Glu Phe Cys Arg Ser Trp Arg Pro Asp Trp He 

115 120 125 

Val. Trp Glu Pro Leu Thr Phe Ala Ala Pro He Ala Ala Arg Val Thr 

130 135 140 

Gly Thr Pro His Ala Arg Met Leu Trp Gly Pro Asp Val Ala Thr Arq 
"5 150 155 leo 

Ala Arg Gin Ser Phe Leu Arg Leu Leu Ala His Gin Glu Val Glu His 

165 170 175 

Arg Glu Asp Pro Leu Ala Glu Trp Phe Asp Trp Thr Leu Arg Arq Phe 

180 185 190 

Gly Asp Asp Pro His Leu Ser Phe Asp Glu Glu Leu Val Leu Gly Gin 

195 200 205 

Trp Thr Val Asp Pro He Pro Glu Pro Leu Arg He Asp Thr Gly Val 

210 215 220 

Arg Thr Val Gly Met Arg Tyr Val Pro Tyr Asn Gly Pro Ser Val Val 
225 230 235 240 

Pro Ala Trp Leu Leu Arg Glu Pro Glu Arg Arg Arg Val Cys Leu Thr 

245 250 255 

Leu Gly Gly Ser Ser Arg Glu His Gly He Gly Gin Val Ser He Gly 

260 265 270 

Glu Met Leu Asp Ala He Ala Asp He Asp Ala Glu Phe Val Ala Thr 

275 280 285 

Phe Asp Asp Gin Gin Leu Val Gly Val Gly Ser Val Pro Ala Asn Val 

290 295 300 

Arg Thr Ala Gly Phe Val Pro Met Asn Val Leu Leu Pro Thr Cys Ala 

305 310 315 . 3 20 

Ala Thr-Val His His Gly Gly Thr Gly Ser Trp Leu Thr Ala Ala He 

325 330 335 
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His 


Gly 


Val 


Pro 


Gin 


lie 


lie 


Leu 


Ser 


Asp 


Ala 


Asp 


Thr 


Glu 


Val 


His 






340 










345 










350 






Ala 


Lys 


Gin 


Leu 


Gin 


Asp 


Leu 


Gly 


Ala 


Gly 


Leu 


Ser 


Leu 


Pro 


Val 


Ala 




355 










360 










365 








Gly 


Met 


Thr 


Ala 


Glu 


His 


Leu 


Arg 


Gly 


TV 1 _ 

Ala 


lie 


Glu 


Arg 


Val 


Leu 


Asp 


370 










375 










380 










Glu 


Pro 


Ala 


Tyr 


Arg 


Leu 


Gly 


Ala 


Glu 


Arg 


Met 


Arg 


Asp 


Gly Met 


Arg 


385 






390 










395 










400 


Thr 


Asp 


Pro 


Ser 


Pro 


Ala 


Gin 


Val 


Val 


Gly 


lie 


Cys 


Gin 


Asp 


Leu 


Ala 








405 










410 










415 




Ala 


Asp 


Arg 


Ala 


Ala 


Arg 


Gly 


Arg 


Gin 


Pro 


Arg 


Arg 


Thr 


Ala 


Glu 


Pro 






420 










425 










430 






His 


Leu 


Pro 
435 


Arg 



























<210> 5 
<211> 390 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 5 

Met Val Thr Ser Thr Asn Leu Asp Thr Thr Ala Arg Pro Ala Leu Asn 

1 . 5 10 15 

Ser Leu Thr Gly Met Arg Phe Val Ala Ala Phe Leu Val Phe Phe Thr 

20 25 30 

His Val Leu Ser Arg Leu lie Pro Asn Ser Tyr Val Tyr Ala Asp Gly 

35 40 45 

Leu Asp Ala Phe Trp Gin Thr Thr Gly Arg Val Gly Val Ser Phe Phe 

50 55 60 

Phe lie Leu Ser Gly Phe Val Leu Thr Trp Ser Ala Arg Ala Ser Asp 
65 70 75 80 

Ser Val Trp' Ser Phe Trp Arg Arg Arg Val Cys Lys Leu Phe Pro Asn 

85 90 95 

His Leu Val Thr Ala Phe Ala Ala Val Val Leu Phe Leu Val Thr Gly 

100 105 110 

Gin Ala Val Ser Gly Glu Ala Leu lie Pro Asn Leu Leu Leu lie His 

115 120 125 

Ala Trp Phe Pro Ala Leu Glu lie Ser Phe Gly lie Asn Pro Val Ser 

130 135 140 

Trp Ser Leu Ala Cys Glu Ala Phe Phe Tyr Leu Cys Phe Pro Leu Phe 
145 150 155 160 

Leu Phe Trp lie Ser Gly lie Arg Pro Glu Arg Leu Trp Ala Trp Ala 

165 170 175 

Ala Val Val Phe Ala Ala He Trp Ala Val Pro Val Val Ala Asp Leu 

180 185 190 

Leu Leu Pro Ser Ser Pro Pro Leu He Pro Gly Leu Glu Tyr Ser Ala 

195 200 205 

He Gin Asp Trp Phe Leu Tyr Thr Phe Pro Ala Thr Arg Ser Leu Glu 

210 215 220 

Phe He Leu Gly He He Leu Ala Arg He Leu He Thr Gly Arg Trp 
225 230 235 240 

He Asn Val Gly Leu Leu Pro Ala Val Leu Leu Phe Pro Val Phe Phe 

245 250 255 

Val Ala Ser Leu Phe Leu Pro Gly Val Tyr Ala He Ser Ser Ser Met 

260 265 270 

Met He Leu Pro Leu Val Leu He He Ala Ser Gly Ala Thr Ala Asp 

275 280 285 

Leu Gin Gin Lys Arg Thr Phe Met Arg Asn Arg Val Met Val Trp Leu 

290 295 300 

Gly Asp Val Ser Phe Ala Leu Tyr Met Val His Phe Leu Val lie Val 
305 310 315 320 
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Tyr Gly Ala Asp Leu Leu Gly Phe Ser Gin Thr Glu Asp Ala Pro Leu 

325 330 - 335 

Gly Leu Ala Leu Phe Met lie lie Pro Phe Leu Ala' Val Ser Leu Val 

340 345 350 

Leu Ser Trp Leu Leu Tyr Arg Phe Val Glu Leu Pro Val Met Arg Asn 

355 360 365 

Trp Ala Arg Pro Ala Ser Ala Arg Arg Lys Pro Ala Thr Glu Pro Glu 

370 375 380 

Gin Thr Pro Ser Arg Arg 
385 390 

<210> 6. 
<211> 374 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 6 

Met Thr Thr Tyr Val Trp Ser Tyr Leu Leu Glu Tyr Glu Arg Glu Arg 

1 5 10. 15 

Ala Asp lie Leu Asp Ala Val Gin Lys Val Phe Ala Ser Gly Ser Leu 

20 25 30 

lie Leu Gly Gin Ser Val Glu Asn Phe Glu Thr Glu Tyr Ala Arg Tyr 

35 40 45- 

His Gly lie Ala His Cys Val Gly Val Asp Asn Gly Thr Asn Ala Val 

50 55 60 

Lys Leu Ala Leu Glu Ser Val Gly Val Gly Arg Asp Asp Glu Val Val 
65 70 , 75 80 

Thr Val Ser Asn Thr Ala Ala Pro Thr Val Leu Ala lie Asp Glu lie 

85 90 95 

Gly Ala Arg Pro Val Phe Val Asp Val Arg Asp Glu Asp Tyr Leu Met 

100 105 HO 

Asp Thr Asp* Leu Val Glu Ala Ala Val Thr Pro Arg Thr Lys Ala He 

115 120 125 

Val Pro Val His Leu Tyr Gly Gin Cys Val Asp Met Thr Ala Leu Arg 

130 135 140 

Glu Leu Ala Asp Arg Arg Gly Leu Lys Leu Val Glu Asp Cys Ala Gin 
145 150 155 160 

Ala His Gly Ala Arg Arg Asp Gly Arg Leu Ala Gly Thr Met Ser Asp 

165 170 175 

Ala Ala Ala Phe Ser Phe Tyr Pro Thr Lys Val Leu Gly Ala Tyr Gly 

180 185 190 

Asp Gly Gly Ala Val Val Thr Asn Asp Asp Glu Thr Ala Arg Ala Leu 

195 200 205 

Arg Arg Leu Arg Tyr Tyr Gly Met Glu Glu Val Tyr Tyr Val Thr Arg 

210 215 220 

Thr Pro Gly His Asn Ser Arg Leu Asp Glu Val Gin Ala Glu He Leu 
225 230 235 240 

Arg Arg Lys Leu Thr Arg Leu Asp Ala Tyr Val Ala Gly Arg Arg Ala 

245 250 255 

Val Ala Gin Arg Tyr Val Asp Gly Leu Ala Asp Leu Gin Asp Ser His 

260 265 270 

Gly Leu Glu Leu Pro Val Val Thr Asp Gly Asn Glu His Val Phe Tyr 

275 280 285 

Val Tyr Val Val Arg His Pro Arg Arg Asp Glu He He Lys Arg Leu 

290 295 300 

Arg Asp Gly Tyr Asp He Ser Leu Asn He Ser Tyr Pro Trp Pro Val 
305 310 315 320 

His Thr Met Thr Gly Phe Ala His Leu Gly Val Ala Ser Gly Ser Leu 

325 330 335 

Pro Val- Thr Glu Arg Leu Ala Gly Glu He Phe Ser Leu Pro Met Tyr 

340 345 350 
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Pro Ser Leu Pro His Asp Leu Gin Asp Arg Val He Glu Ala Val Arg 

355 360 365 " 

Glu Val He Thr Gly Leu 
370 

<210> 7 
<211> 257 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 7 

Met Pro Asn Ser His Ser Thr Thr Ser Ser Thr Asp Val Ala Pro Tyr 

! 5 10 15 

Glu Arg Ala Asp lie Tyr His Asp Phe Tyr His Gly Arg Gly Lys Gly 

20 25 30 

Tyr Arg Ala Glu Ala Asp Ala Leu Val Glu Val Ala Arg Lys His Thr 

35 40 45 

Pro Gin Ala Ala Thr Leu Leu Asp Val Ala Cys Gly Thr Gly Ser His 

50 55 60 

Leu Val Glu Leu Ala Asp Ser Phe Arg Glu Val Val Gly Val Asp Leu 
65 70 75 80 

Ser Ala Ala Met Leu Ala Thr Ala Ala Arg Asn Asp Pro Gly Arg Glu 

85 90 * 95 

Leu His Gin Gly Asp Met Arg Asp Phe Ser Leu Asp Arg Arg Phe Asp 

100 105 110 

Val Val ThV Cys Met Phe Ser Ser Thr Gly Tyr Leu Val Asp Glu Ala 

115 120 125 

Glu Leu Asp Arg Ala Val Ala Asn Leu Ala Gly His Leu Ala Pro Gly 

130 135 140 

Gly Thr Leu Val Val Glu Pro Trp Trp Phe Pro Glu Thr Phe Arg Pro 
145 150 155 160 

Gly Trp Val Gly Ala Asp Leu Val Thr Ser Gly Asp Arg Arg He Ser 

165 170 175 

Arg Met Ser His Thr Val Pro Ala Gly Leu Pro Asp Arg Thr Ala Ser 

180 185 190 

Arg Met Thr He His Tyr Thr Val Gly Ser Pro Glu Ala Gly He Glu 

195 200 205 

His Phe Thr Glu Val His Val Met Thr Leu Phe Ala Arg Ala Ala Tyr 

210 215 220 

Glu Gin Ala Phe Gin Arg Ala Gly Leu Ser Cys Ser Tyr Val Gly His 
225 230 235 240 

Asp Leu Phe Ser Pro Gly Leu Phe Val Gly Val Ala Ala Glu Pro Gly 

245 250 255 

Arg 



<210> 8 
<211> 201 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 8 



Met 


Arg 


Val 


Glu 


Glu 


Leu 


Gly 


He 


Glu 


Gly 


Val 


Phe 


Thr 


Phe 


Thr 


Pro 


1 






5 










10 










15 




Gin 


Thr 


Phe 


Ala 


Asp 


Glu 


Arg 


Gly 


Val 


Phe 


Gly 


Thr 


Ala 


Tyr 


Gin 


Glu 








20 








25 










30 






Asp 


Val 


Phe 


Val 


Ala 


Ala 


Leu 


Gly 


Arg 


Pro 


Leu 


Phe 


Pro 


Val 


Ala 


Gin 




35 










40 










45 




r~ 




Val 


Ser 


Thr 


Thr 


Arg 


Ser 


Arg 


Arg 


Gly 


Val 


Val 


Arg 


Gly 


Val 


His 


Phe 




50 - 










55 










60 










Thr 


Thr 


Met 


Pro 


Gly 


Ser 


Met 


Ala 


Lys 


Tyr 


Val 


Tyr 


Cys 


Ala 


Arg 


Gly 
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65 










70 










1 s 










O A 


Ara 


Al a 




/AO V-S 


PhA 
rue 

85 


al a 

AX o 


v a j. 


Sen 


J.1C 


nl g 

on 
?u 


D ■»» *-v 

fro 


oi y 




fro 


Thr 


Phe 


Gl v 


Ara 


Ala 


Glu 

VJ X U 


Pro 


Va 1 


Glu 

V9 X u 






/A J_ a 


ul U 






v ai 


L»ly 


Leu 








100 










105 










110 




fur 




P rn 


v ax 


Gl v 
<Jiy 


l. J t. i. 


Gl v 


Hie: 

n j. o 


T oil 
J_> li 


Ph o 


Vdl 




Leu 




Asp 


Asp 






115 










i ?n 










1 oc 
-1 £. D 










Vdl 


T* \r -r- 

i yr 


Lieu 






Hia 


Gly 


Tyr 


val 


Pro 


Asp 


Lys 


Glu 




130 










135 










140 








Arg 


Ala 


val 


His 


Pro 


Leu 


Asp 


Pro 


Glu 


Leu 


Ala 


Leu 


Pro 


He 


Pro 


Ala 


145 










150 










155 










160 


Asp 


Leu 


Asp 


Leu 


Val 
165 


Met 


Ser 


Glu 


Arg 


Asp 
170 


Arg 


Val 


Ala 


Pro 


Thr 
175 


Leu 


Arg 


Glu 


Ala 


Arg 


Asp 


Gin 


Gly 


He 


Leu 


Pro 


Asp 


Tyr 


Ala 


Ala 


Cys 


Arg 








180 










185 










190 


Ala 


Ala 


Ala 
195 


His 


Arg 


Val 


Val 


Arg 
200 


Thr 

















<210> 9 

<211> 328 

<212> PRT 

<213> Micromono 

<400> 9 

Met Val Val Leu 
1 

Ala Leu Ala Asp 

20 

Val Val Val Pro 
35 

Asp Leu Thr Glu 
50 

Ala Val Phe Pro 
65 

He Ser Glu Asp 

Arg Asp Leu He 

100 

Val Phe Pro Gly 
115 

Val lie Asp Gly 
130 

Gin Lys His Thr 
145 

Ala He Arg Ala 

Ala Ala Gly Thr 

180 

Arg Ala Leu Thr 
195 

Arg Arg Glu Leu 
210 

Ala Leu Asp His 
225 

Thr Gly Arg Ser 

Ser Val Ala Arg 

260 

Pro Pro Pro Ala 

• 275 
Asp Pro Ala Arg 



;pora megalomicea 

Gly Ala Ser Gly 
5 

Leu Pro Val Arg 

Ser Gly Ala Val 

40 

Pro Gly Ala Leu 
55 

Phe Ala Ala Gin 
70 

Asp Val Val Ala 
85 

Ala Val Leu Ser 

Ser Asn Thr Gin 

120 

Ser Glu Gin Asp 
135 

Gly Glu Gin Leu 
150 

Thr Ser Leu Arg 
165 

Ala Asp Asp Arg 

Gly Gin Pro Leu 

200 

Leu Tyr Val Thr 
215 

Ala Asp Ala Leu 
230 

Trp Pro Leu Gly 
245 

His Thr Gly Glu 

His Met Asp Pro 

280 

Phe Thr Ala Val 



Phe Leu Gly Ser 
10 

Val Arg Leu Val 
25 

Ala Asp Tyr Glu 

Ala Glu Val Val 

60 

He Arg Gly Thr 
75 

Glu Arg Thr Asn 
90 

Arg Ser Pro His 
105 

Val Gly Arg Val 

His Pro Glu Gly 

140 

Leu Lys Glu Ala 
155 

Leu Pro Pro Val 
170 

Gly Val Val Ser 
185 

Thr Met Trp His 

Asp Ala Ala Arg 

220 

Ala Gly Arg His 
235 

Glu Val Phe Gin 
250 

Asp Pro Val Pro 
265 

Ser Asp Leu Arg 
Thr Gly Trp Arg 



TV 1 

Ala 


Val 


Thr 


His 






15 




Ala 


Arg 


Arg 


Glu 




30 






1 nr 


His 


Arg 


Val 


4 5 








Ala 


Asp 


Ala 


Arg 


Ser 


Gly 


Trp 


Arg 








80 


Val 


Gly 


Leu 


Val 






95 




Ala 


Pro 


Val 


Val 




110 






Thr 


Ala 


Gly 


Arg 


125 








Val 


Tyr 


Asp 


Arg 


Thr 


Ala 


Ala 


Gly 








160 


Phe 


Gly 


Val 


Pro 






175 




Thr 


Met 


He 


Arg 




190 






Asp 


Gly 


Thr 


Val 


205 








Ala 


Phe 


Val 


Thr 


Phe 


Leu 


Leu 


Gly 








240 


Ala 


Val 


Ser 


Arg 






255 




Val 


Val 


Ser 


Val 




270 






Ser 


Val 


Glu 


Val 


285 








Ala 


Thr 


Val 


Thr 
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290 295 300 

Met Ala Glu Ala Val Asp Arg Thr Val Ala Ala Leu Ala Pro Arg Arg 
305 310 315 . 320 

Ala Ala Ala Pro Ser Glu Pro Ser 

325 

<210> 10 
<211> 330 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 10 

Met Gly Thr Thr Gly Ala Gly Ser Ala Arg Val Arg Val Gly Arg Ser 

! 5 10 15 

Ala Leu His Thr Ser Arg Leu Trp Leu Gly Thr Val Asn Phe Ser Gly 

20 25 30 

Arg Val Thr Asp Asp Asp Ala Leu Arg Leu Met Asp His Ala Leu Glu 

35 40 45 

Arg Gly Val Asn Cys He Asp Thr Ala Asp lie Tyr Gly Trp Arg Leu 

50 55 60 

Tyr Lys Gly His Thr Glu Glu Leu Val Gly Arg Trp Phe Ala Gin Gly 
65 70 75 . 80 

Gly Gly Arg Arg Glu Glu Thr Val Leu Ala Thr Lys Val Gly Ser Glu 

85 90 95 

Met Ser Glu Arg Val Asn Asp Gly Gly Leu Ser Ala Arg His He Val 

100 105 110 

Ala Ala Cys Glu Asn Ser Leu Arg Arg Leu Gly Val Asp His He Asp 

115 120 125 

He Tyr Gin Thr His His He Asp Arg Ala Ala Pro Trp Asp Glu Val 

130 135 140 

Trp Gin Ala Ala Glu His Leu Val Gly Ser Gly Lys Val Gly Tyr Val 
145 * 150 155 160 

Gly Ser Ser Asn Leu Ala Gly Trp His He Ala Ala Ala Gin Glu Ser 

165 170 175 

Ala Ala Arg Arg Asn Leu Leu Gly Met He Ser His Gin Cys Leu Tyr 

180 185 190 

Asn Leu Ala Val Arg His Pro Glu Leu Asp Val Leu Pro Ala Ala Gin 

195 200 205 

Ala Tyr Gly Val Gly Val Phe Ala Trp Ser Pro Leu His Gly Gly Leu 

210 215 220 

Leu Ser Gly Val Leu Glu Lys Leu Ala Ala Gly Thr Ala Val Lys Ser 
225 230 235 240 

Ala Gin Gly Arg Ala Gin Val Leu Leu Pro Ala Val Arg Pro Leu Val 

245 250 255 

Glu Ala Tyr Glu Asp Tyr Cys Arg Arg Leu Gly Ala Asp Pro Ala Glu 

260 265 270 

Val Gly Leu Ala Trp Val Leu Ser Arg Pro Gly lie Leu Gly Ala Val 

275 280 285 

He Gly Pro Arg Thr Pro Glu Gin Leu Asp Ser Ala Leu Arg Ala Ala 

290 295 300 

Glu Leu Thr Leu Gly Glu Glu Glu Leu Arg Glu Leu Glu Ala He Phe 
305 310 315 320 

Pro Ala Pro Ala Val Asp Gly Pro Val Pro 

325 330 

<210> 11 
<211> 417 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 11 
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Met 


Arg 


Val 


Leu 


Leu 


Thr 


Ser 


Phe 


.1 








5 








Leu 


Val 


Pro 


Leu 


Ala 


Trp 


Ala 


Leu 








20 










Val 


Ala 


Ser 


Gin 


Pro 


Glu 


Leu 


Thr 






35 










40 


Thr 


Ser 


Val 


Pro 


Leu 


Gly 


Ser 


Asp 




50 










55 




Glu 


Ala 


Ala 


Ala 


Gin 


Val 


His 


Arg 


65 






.• 




70 






Arg 


Arg 


Gly 


Pro 


Glu 


Leu 


Arg 


Ser 










85 








Glu 


Ala 


Thr 


Ser 


Arg 


Phe 


Val 


Phe 








100 










Val 


Asp 


Glu 


Leu 


Val 


Glu 


Phe 


Ala 






115 










120 


Leu 


Trp 


Glu 


Pro 


Phe 


Thr 


Phe 


Ala 




130 










135 




Gly 


Ala 


Ala 


His 


Ala 


Arg 


Leu 


Leu 


145 










150 






Phe 


Arg 


Ser 


Arg 


Ser 


Gin 


Asp 


Leu 










165 








Arg 


Pro 


Asp 


Pro 


Leu 


Gly 


Gly 


Trp 








180 










Gly 


Leu 


Asp 


Tyr 


Ser 


Glu 


Asp 


Leu 






195 










200 


Gin 


Leu 


Pro 


Glu 


Ser 


Phe 


Arg 


Leu 




210 










215 




Thr 


Arg 


Thr 


Leu 


Pro 


Tyr 


Asn 


Gly 


225 










230 






Arg 


Thr 


Ser 


Asp 


Gly 


Val 


Arg 


Arg 






- 




245 








Ala 


Leu 


Gly 


He 


Thr 


Ser 


Asn 


Pro 








260 










Thr 


Leu 


Ala 


Arg 


Phe 


Asp 


Gly 


Glu 






275 










280 


Asp 


Pro 


Ala 


Ser 


Val 


Pro 


Asp 


Asn 




290 










295 




Met 


Asn 


lie 


Leu 


Leu 


Pro 


Gly 


Cys 


305 










310 






Ala 


Gly 


Ser 


Trp 


Ala 


Thr 


Ala 


Leu 










325 








Val 


Ala 


His 


Glu 


Trp 


Asp 


Cys 


Val 








340 










Leu 


Gly 


Ala 


Gly 


Val 


Phe 


Leu 


Arg 






355 










360 


Leu 


Trp 


Gin 


Ala 


Leu 


Ala 


Thr 


Val 




370 










375 




Asn 


Ala 


Glu 


Lys. 


Leu 


Arg 


Gin 


Glu 


385 










390 






Glu 


Val 


Val 


Pro 


Val 


Leu 


Glu 


Ala 



405 



Arg 



Ala 


His 


Arg 


Thr 


His 


Phe 


Gin 


Gly 




10 








- 


15 


His 


Thr 


Ala 


Gly 


His 


Asp 


Val 


Arg 


25 










30 




Asp 


Val 


Val 


Val 


Gly 


Ala 


Gly 


Leu 










45 








His 


Arg 


Leu 


Phe 


Asp 


He 


Ser 


Pro 








60 










Tyr 


Thr 


Thr 


Asp 


Leu 


Asp 


Phe 


Ala 






75. 










80 


Trp 


Glu 


Phe 


Leu 


His 


Gly 


He 


Glu 




90 










95 




Pro 


Val 


Val 


Asn 


Asn 


Asp 


Ser 


Phe 


105 










110 






Met 


Asp 


Trp 


Arg 


Pro 


Asp 


Leu 


Val 










125 








Gly 


Ala 


Val 


Ala 


Ala 


Lys 


Ala 


Cys 








140 










Trp 


Gly 


Ser 


Asp 


Leu 


Thr 


Gly 


Tyr 






155 










160 


Arg 


Gly 


Gin 


Arg 


Pro 


Ala 


Asp 


Asp 




170 








— 


175 


Leu 


Thr 


Glu 


Val 


Ala 


Gly 


Arg 


Phe 


185 










190 






Ala 


Val 


Gly 


Gin 


Trp 


Ser 


Val 


Asp 










205 








Glu 


Thr 


Gly 


Leu 


Glu 


Ser 


Val 


His 








220 










Ser 


Ser 


Val 


Val 


Pro 


Gin 


Trp 


Leu 






235 








240 


Val 


Cys 


Phe 


Thr 


Gly 


Gly 


Tyr 


Ser 




250 










255 




Gin 


Glu 


Phe 


Leu 


Arg 


Thr 


Leu 


Ala 


265 










270 






He 


Val 


Val 


Thr 


Arg 


Ser 


Gly 


Leu 










285 








Val 


Arg 


Leu 


Val 


Asp 


Phe 


Val 


Pro 








300 










Ala 


Ala 


Val 


He 


His 


His 


Gly 


Gly 






315 










320 


His 


His 


Gly 


Val 


Pro 


Gin 


He 


Ser 




330 










335 




Leu 


Arg 


Gly 


Gin 


Arg 


Thr 


Ala 


Glu 


345 










350 






Pro 


Asp 


Glu 


Val 


Asp 


Ala 


Asp 


Thr 










3 65 








Val 


Glu 


Asp 


Arg 


Ser 


His 


Ala 


Glu 








380 










Ala 


Leu 


Ala 


Ala 


Pro 


Thr 


Pro 


Ala 






395 










400 


Leu 


Ala 


His 


Gin 


His 


Arg 


Ala 


Asp 




410 










415 
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Met Thr Arg His Val Thr Leu Leu Gly Val Ser Gly Phe Val Gly Ser 

1 5 10 15 

Ala Leu Leu Arg Glu Phe Thr Thr His Pro Leu Arg Leu Arg Ala Val 

20 25 . 30 

Ala Arg Thr Gly Ser Arg Asp Gin Pro Pro Gly Ser Ala Gly He Glu 

35 40 45 

His Leu Arg Val Asp Leu Leu Glu Pro Gly Arg Val Ala Gin Val Val 

50 55 60 

Ala Asp Thr Asp Val Val Val His Leu Val Ala Tyr Ala Ala Gly Gly 
65 70 . 75 * 80 

Ser Thr Trp Arg Ser Ala Ala Thr Val Pro Glu Ala Glu Arg Val Asn 

85 90 95 

Ala Gly He Met Arg Asp Leu Val Ala Ala Leu Arg Ala Arg Pro Gly 

100 105 110 

Pro Ala Pro Val Leu Leu Phe Ala Ser Thr Thr Gin Ala Ala Asn Pro 

115 120 125 

Ala Ala Pro Ser Arg Tyr Ala Gin His Lys He Glu Ala Glu Arg He 

130 135 140 

Leu Arg Gin Ala Thr Glu Asp Gly Val Val Asp Gly Val lie Leu Arg 
145 150 155 160 

Leu Pro Ala He Tyr Gly His Ser Gly Pro Ser Gly Gin Thr Gly Arg 

165 170 v 175 

Gly Val Val Thr Ala Met He Arg Arg Ala Leu Ala Gly Glu Pro He 

180 185 190 

Thr Met Trp His Glu Gly Ser Val Arg Arg Asn Leu Leu His Val Glu 

195 200 205 

Asp Val Ala Thr Ala Phe Thr Ala Ala Leu His Asn His Glu Ala Leu 

210 215 220 

Val Gly Asp Val Trp Thr Pro Ser Ala Asp Glu Ala Arg Pro Leu Gly 
225 230 235 240 

Glu He Phe Glu Thr Val Ala Ala Ser Val Ala Arg Gin Thr Gly Asn 

245 250 255 

Pro Ala Val Pro Val Val Ser Val Pro Pro Pro Glu Asn Ala Glu Ala 

260 265 270 

Asn Asp Phe Arg Ser Asp Asp Phe Asp Ser Thr Glu Phe Arg Thr Leu 

275 280 285 

Thr Gly Trp His Pro Arg Val Pro Leu Ala Glu Gly He Asp Arg Thr 

290 295 300 

Val Ala Ala Leu He Ser Thr Lys Glu 
305 310 

<210> 13 
<211> 3546 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 13 

Met Val Asp Val Pro Asp Leu Leu Gly Thr Arg Thr Pro His Pro Gly 

1 5 10 15 

Pro Leu Pro Phe Pro Trp Pro Leu Cys Gly His Asn Glu Pro Glu Leu 

20 25 30 

Arg Ala Arg Ala Arg Gin Leu His Ala Tyr Leu Glu Gly He Ser Glu 

35 40 45 

Asp Asp Val Val Ala Val Gly Ala Ala Leu Ala Arg Glu Thr Arg Ala 

50 55 60 

Gin Asp Gly Pro His Arg Ala Val Val Val Ala Ser Ser Val Thr Glu 
65 70 75 80 

Leu Thr Ala Ala Leu Ala Ala Leu Ala Gin Gly Arg Pro His Pro Ser 

85 90 95 

Val Val- Arg Gly Val Ala Arg Pro Thr Ala Pro Val Val Phe Val Leu 

100 105 HO 
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Pro 


Gly 


Gin 


Gly 


Ala 


Gin 


Trp 


Pro 


Gly 


Met 


Ala 


Thr 


Arg 


Leu 


Leu 


Ala 






115 










120 










125 








Glu 


Ser 


Pro 


Val 


Phe 


Ala 


Ala 


Ala 


Met 


Arg 


Ala 


Cys 


Glu 


Arg 


Ala 


Phe 




130 










135 










140 










Asp 


Glu 


Val 


Thr 


Asp 


Trp 


Ser 


Leu 


Thr 


Glu 


Val 


Leu 


Asp 


Ser 


Pro 


Glu 


145 










150. 








. 


155 










160 


His 


Leu 


Arg 


Arg 


Val 


Glu 


Val 


Val 


Gin 


Pro 


Ala 


Leu 


Phe 


Ala 


Val 


Gin 










165 










170 










175 




Thr 


Ser 


Leu 


Ala 


Ala 


Leu 


Trp 


Arg 


Ser 


Phe 


Gly 


Val 


Arg 


Pro 


Asp 


Ala 








180 










185 










190 




Val 


Leu 


Gly 


His 


Ser 


He 


Gly 


Glu 


Leu 


Ala 


Ala 


Ala 


Glu 


Val 


Cys 


Gly 






195 










200 










205 




Ala 


Val 


Asp 


Val 


Glu 


Ala 


Ala 


Ala 


Arg 


Ala 


Ala 


Ala 


Leu 


Trp 


Ser 


Arg 




210 










215 










220 








Glu 


Met 


Val 


Pro 


Leu 


Val 


Gly 


Arg 


Gly 


Asp 


Met 


Ala 


Ala 


Val 


Ala 


Leu 


225 




* 






230 










235 










240 


Ser 


Pro 


Ala 


Glu 


Leu 


Ala 


Ala 


Arg 


Val 


Glu 


Arg 


Trp 


Asp 


Asp 


Asp 


Val 










245 










250 










255 




Val 


Pro 


Ala 


Gly 


Val 


Asn 


Gly 


Pro 


Arg 


Ser 


Val 


Leu 


Leu 


Thr 


Gly 


Ala 








260 










265 










270 




Pro 


Glu 


Pro 


He 


Ala 


Arg 


Arg 


Val 


Ala 


Glu 


Leu 


Ala 


Ala 


Gin 


Gly 


Val 






275 










280 










285 


— 




Arg 


Ala 


Gin 


Val 


Val 


Asn 


Val 


Ser 


Met 


Ala 


Ala 


His 


Ser 


Ala 


Gin 


Val 




290 










295 










300 










Asp 


Ala 


Val 


Ala 


Glu 


Gly 


Met 


Arg 


Ser 


Ala 


Leu 


Thr 


Trp 


Phe 


Ala 


Pro 


305 










310 










315 










320 


Gly 


Asp 


Ser 


Asp 


Val 


Pro 


Tyr 


Tyr 


Ala 


Gly 


Leu 


Thr 


Gly 


Gly 


Arg 


Leu 










325 










330 










335 




Asp 


Thr 


Arg' 


Glu 


Leu 


Gly 


Ala 


Asp 


His 


Trp 


Pro 


Arg 


Ser 


Phe 


Arg 


Leu 








340 










345 




• 






350 






Pro 


Val 


Arg 


Phe 


Asp 


Glu 


Ala 


Thr 


Arg 


Ala 


Val 


Leu 


Glu 


Leu 


Gin 


Pro 






355 










360 










365 








Gly 


Thr 


Phe 


He 


Glu 


Ser 


Ser 


Pro 


His 


Pro 


Val 


Leu 


Ala 


Ala 


Ser 


Leu 




370 










375 










380 










Gin 


Gin 


Thr 


Leu 


Asp 


Glu 


Val 


Gly 


Ser 


Pro 


Ala 


Ala 


He 


Val 


Pro 


Thr 


385 










390 










395 










400 


Leu 


Gin 


Arg 


Asp 


Gin 


Gly 


Gly 


Leu 


Arg 


Arg 


Phe 


Leu 


Leu 


Ala 


Val 


Ala 










405 










410 










415 




Gin 


Ala 


Tyr 


Thr 


Gly 


Gly 


Val 


Thr 


Val 


Asp 


Trp 


Thr 


Ala 


Ala 


Tyr 


Pro 








420 










425 










430 




Gly 


Val 


Thr 


Pro 


Gly 


His 


Leu 


Pro 


Ser 


Ala 


Val 


Ala 


Val 


Glu 


Thr 


Asp 






435 










440 










445 






Glu 


Gly 


Pro 


Ser 


Thr 


Glu 


Phe 


Asp 


Trp 


Ala 


Ala 


Pro 


Asp 


His 


Val 


Leu 




450 










455 










4 60 










Arg 


Ala 


Arg 


Leu 


Leu 


Glu 


He 


Val 


Gly 


Ala 


Glu 


Thr 


Ala 


Ala 


Leu 


Ala 


465 










470 










475 










480 


Gly 


Arg 


Glu 


Val 


Asp 


Ala 


Arg 


Ala 


Thr 


Phe 


Arg 


Glu 


Leu 


Gly 


Leu 


Asp 










485 










490 










495 


Ser 


Val 


Leu 


Ala 


Val 


Gin 


Leu 


Arg 


Thr 


Arg 


Leu 


Ala 


Thr 


Ala 


Thr 


Gly 








500 










505 










510 




Arg 


Asp 


Leu 


His 


He 


Ala 


Met 


Leu 


Tyr 


Asp 


His 


Pro 


Thr 


Pro 


His 


Ala 






515 










520 










525 








Leu 


Thr 


Glu 


Ala 


Leu 


Leu 


Arg 


Gly 


Pro 


Gin 


Glu 


Glu 


Pro 


Gly 


Arg 


Gly 




530 










535 










540 






Glu 


Glu 


Thr 


Ala 


His 


. Pro 


Thr 


Glu 


Ala 


Glu 


Pro 


Asp 


Glu 


Pro 


Val 


Ala 


545 










550 










555 










560 


Val 


Val 


Ala 


Met 


Ala 


Cys 


Arg 


Leu 


Pro 


Gly 


Gly 


Val 


Thr 


Ser 


Pro 


Glu 










565 










570 










575 




Glu 


Phe 


Trp 


Glu 


Leu 


Leu 


Ala 


Glu 


Gly 


Arg 


Asp 


Ala 


Val 


Gly 


Gly 


Leu 








580 










585 










590 






Pro 


Thr 


Asp 


Arg 


Gly 


Trp 


Asp 


Leu 


Asp 


Ser 


Leu 


Phe 


His 


Pro 


Asp 


Pro 
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595 600 605 

Thr Arg Ser Gly Thr Ala His Gin Arg Ala Gly Gly Phe Leu Thr Gly . 

610 615 620 

Ala Thr Ser Phe Asp Ala Ala Phe Phe Gly Leu Ser Pro Arg Glu Ala 
625 630 635 640 

Leu Ala Val Glu Pro Gin Gin Arg lie Thr Leu Glu Leu Ser Trp Glu 

645 650 655 

Val Leu Glu Arg Ala Gly lie Pro Pro Thr Ser Leu Arg Thr Ser Arg 

660 665 670 

Thr Gly Val Phe Val Gly Leu lie Pro Gin Glu Tyr Gly Pro Arg Leu 

675 680 685 

Ala Glu Gly Gly Glu Gly Val Glu Gly Tyr Leu Met Thr Gly Thr Thr 

690 695 700 

Thr Ser Val Ala Ser Gly Arg Val Ala Tyr Thr Leu Gly Leu Glu Gly 
705 710 715 720 

Pro Ala lie Ser Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Val 

725 730 . 735 

His Leu Ala Cys Gin Ser Leu Arg Arg Gly Glu Ser Thr Met Ala Leu 

740 745 750 

Ala Gly Gly Val Thr Val Met Pro Thr Pro Gly Met Leu Val Asp Phe 

755 760 765 

Ser Arg Met Asn Ser Leu Ala Pro Asp Gly Arg Ser Lys Ala Phe Ser 

770 775 780 

Ala Ala Ala Asp Gly Phe Gly Met Ala Glu Gly Ala Gly Met Leu Leu 
785 790 795 800 

Leu Glu Arg Leu Ser Asp Ala Arg Arg His Gly His Pro Val Leu Ala 

805 810 815 

Val lie Arg Gly Thr Ala Val Asn Ser Asp Gly Ala Ser Asn Gly Leu 

820 825 830 

Ser Ala Pro Asn Gly Arg Ala Gin Val Arg Val lie Arg Gin Ala Leu 

835 840 845 

Ala Glu Ser -Gly Leu Thr Pro His Thr Val Asp Val Val Glu Thr His 

850 855 860 

Gly Thr Gly Thr Arg Leu Gly Asp Pro lie Glu Ala Arg Ala Leu Ser 
865 870 875 880 

Asp Ala Tyr Gly Gly Asp Arg Glu His Pro Leu Arg He Gly Ser Val 

885 890 895 

Lys Ser Asn He Gly His Thr Gin Ala Ala Ala Gly Val Ala Gly Leu 

900 905 910 

He Lys Leu Val Leu Ala Met Gin Ala Gly Val Leu Pro Arg Thr Leu 

915 920 925 

His Ala Asp Glu Pro Ser Pro Glu He Asp Trp Ser Ser Gly Ala He 

930 935 940 

Ser Leu Leu Gin Glu Pro Ala Ala Trp Pro Ala Gly Glu Arg Pro Arg 
945 950 955 960 

Arg Ala Gly Val Ser Ser Phe Gly lie Ser Gly Thr Asn Ala His Ala 

965 970 975 

He He Glu Glu Ala Pro Pro Thr Gly Asp Asp Thr Arg Pro Asp Arg 

980 985 990 

Met Gly Pro Val Val Pro Trp Val Leu Ser Ala Ser Thr Gly Glu Ala 

995 1000 1005 

Leu Arg Ala Arg Ala Ala Arg Leu Ala Gly His Leu Arg Glu His Pro 

1010 1015 1020 

Asp Gin Asp Leu Asp Asp Val Ala Tyr. Ser Leu Ala Thr Gly Arg Ala 
1025 1030 1035 1040 

Ala Leu Ala Tyr Arg Ser Gly Phe Val Pro Ala Asp Ala Ser Thr Ala 

1045 1050 1055 

Leu Arg He Leu Asp Glu Leu Ala Ala Gly Gly Ser Gly Asp Ala Val 

1060 1065 1070 . 

Thr Gly Thr Ala Arg Ala Pro Gin Arg Val Val Phe Val Phe Pro Gly 
1075 1080 1085 
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Gin Gly Trp Gin Trp Ala Gly Met Ala Val Asp Leu Leu Asp Gly Asp 

1090 1095 llOO 

Pro Val Phe Ala Ser Val Leu Arg Glu Cys Ala Asp Ala Leu Glu Pro 
1105 1110 1115 H20 

Tyr Leu Asp Phe Glu lie Val Pro Phe Leu Arg Ala Glu Ala Gin Arg 

1125 1130 H35 

Arg Thr Pro Asp His Thr Leu Ser Thr Asp Arg Val Asp Val Val Gin 

1140 1145 1150 

Pro Val Leu Phe Ala Val Met Val Ser Leu Ala Ala Arg Trp Arg-Ala' 

1155 1160 H65 • 

Tyr Gly Val Glu Pro Ala Ala Val lie Gly His Ser Gin Gly Glu lie 

1170 1175 1180 

Ala Ala Ala Cys Val Ala Gly Ala Leu Ser Leu Asp Asp Ala Ala Arg 
1185 1190 1195 1200 

Ala Val Ala Leu Arg Ser Arg Val lie Ala Thr Met Pro Gly Asn Gly 

1205 1210 1215 

Ala Met Ala Ser lie Ala Ala Ser Val Asp Glu Val Ala Ala Arg He 

1220 1225 1230 

Asp Gly Arg Val Glu He Ala Ala Val Asn Gly Pro Arg Ala Val Val 

1235 1240 1245 

Val Ser Gly Asp Arg Asp Asp Leu Asp Arg Leu Val Ala Ser Cys Thr 

1250 1255 1260 

Val Glu Gly Val Arg Ala Lys Arg Leu Pro Val Asp Tyr Ala Ser His 
1265 1270 1275 1280 

Ser Ser His Val Glu Ala Val Arg Asp Ala Leu His Ala Glu Leu Gly 

1285 1290 1295 

Glu Phe Arg Pro Leu Pro Gly Phe Val Pro Phe Tyr Ser Thr Val Thr 

1300 1305 1310 

Gly Arg Trp Val Glu Pro Ala Glu Leu Asp Ala Gly Tyr Trp Phe Arg 

1315 1320 1325 

Asn Leu Arg His Arg Val Arg Phe Ala Asp Ala Val Arg Ser Leu Ala 

1330 * 1335 1340 

Asp Gin Gly Tyr Thr Thr Phe Leu Glu Val Ser Ala His Pro Val Leu 
1345 1350 1355 1360 

Thr Thr Ala He Glu Glu He Gly Glu Asp Arg Gly Gly Asp Leu Val 

1365 1370 1375 

Ala Val His Ser Leu Arg Arg. Gly Ala Gly Gly Pro Val Asp Phe Gly 

1380 1385 1390 

Ser Ala Leu Ala Arg Ala Phe Val Ala Gly Val Ala Val Asp Trp Glu 

1395 1400 1405 

Ser Ala Tyr Gin Gly Ala Gly Ala Arg Arg Val Pro Leu Pro Thr Tyr 

1410 1415 1420 

Pro Phe Gin Arg Glu Arg Phe Trp Leu Glu Pro Asn Pro Ala Arg Arg 
1425 1430 1435 1440 

Val Ala. Asp Ser Asp Asp Val Ser Ser Leu Arg Tyr Arg He Glu Trp 

1445 1450 1455 

His Pro Thr Asp Pro Gly Glu Pro Gly Arg Leu Asp Gly Thr Trp Leu 

1460 1465 1470 

Leu Ala Thr Tyr Pro Gly Arg Ala Asp Asp Arg Val Glu Ala Ala Arg 

1475 1480 1485 

Gin Ala Leu Glu Ser Ala Gly Ala Arg Val Glu Asp Leu Val Val Glu 

1490 1495 1500 

Pro Arg Thr Gly Arg Val Asp Leu Val Arg Arg Leu Asp Ala Val Gly 
1505 1510 1515 1520 

Pro Val Ala Gly Val Leu Cys Leu Phe Ala Val Ala Glu Pro Ala Ala 

1525 1530 1535 

Glu His Ser Pro Leu Ala Val Thr Ser Leu Ser Asp Thr Leu Asp Leu 

1540 1545 1550 

Thr Gin Ala Val Ala Gly Ser Gly Arg Glu Cys Pro He Trp Val Val 

- 1555 1560 1565 

Thr Glu Asn Ala Val Ala Val Gly Pro Phe Glu Arg Leu Arg Asp Pro 
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1570 1575 1580 

Ala His Gly Ala Leu Trp Ala Leu Gly Arg Val Val Ala Leu Glu Asn 
1585 1590 1595 1600 

Pro Ala Val Trp Gly Gly Leu Val Asp Val Pro Ser Gly Ser Val Ala 

1605 1610 1615 

Glu Leu Ser Arg His Leu Gly Thr Thr Leu Ser Gly Ala Gly Glu Asp 

1620 1625 1630 

Gin Val Ala Leu Arg Pro Asp Gly Thr Tyr Ala Arg Arg Trp Cys Arg 

1635 1640 1645 " 

Ala Gly Ala Gly Gly Thr Gly Arg Trp Gin Pro Arg Gly Thr Val Leu 

1650 1655 1660 

Val Thr Gly Gly Thr Gly Gly Val Gly Arg His Val Ala Arg Trp Leu 
!665 1670 1675 1680 

Ala Arg Gin Gly Thr Pro Cys Leu Val Leu Ala Ser Arg Arg Gly Pro 

1685 1690 1695 

Asp Ala Asp Gly Val Glu Glu Leu Leu Thr Glu Leu Ala Asp. Leu Gly 

1700 1705 1710 

Thr Arg Ala Thr Val Thr Ala Cys Asp Val Thr Asp Arg Glu Gin Leu 

1715 1720 1725 

Arg Ala Leu Leu Ala Thr Val Asp Asp Glu His Pro Leu Ser Ala Val 

1730 1735 1740 

Phe His Val Ala Ala Thr Leu Asp Asp Gly Thr Val Glu TJhr Leu Thr 
1745 1750 1755 1760 

Gly Asp Arg lie Glu Arg Ala Asn Arg Ala Lys Val Leu Gly Ala Arg 

1765 1770 1775 

Asn Leu His Glu Leu Thr Arg Asp Ala Asp Leu Asp Ala Phe Val Leu 

1780 1785 1790 

Phe Ser Ser Ser Thr Ala Ala Phe Gly Ala Pro Gly Leu Gly Gly Tyr 

1795 1800 1805 

Val Pro Gly Asn Ala Tyr Leu Asp Gly Leu Ala Gin Gin Arg Arg Ser 

1810 1815 1820 

Glu Gly Leu' Pro Ala Thr Ser Val Ala Trp Gly Thr Trp Ala Gly Ser 
1825 1830 1835 1840 

Gly Met Ala Glu Gly Pro Val Ala Asp .Arg Phe Arg Arg His Gly Val 

1845 1850 1855 

Met Glu Met His Pro Asp Gin Ala Val Glu Gly Leu Arg Val Ala Leu 

I860 1865 1870 

Val Gin Gly Glu Val Ala Pro He Val Val Asp He Arg Trp Asp Arg 

1875 1880 1885 

Phe Leu Leu Ala Tyr Thr Ala Gin Arg Pro Thr Arg Leu Phe Asp Thr 

1890 1895 1900 

Leu Asp Glu Ala Arg Arg Ala Ala Pro Gly Pro Asp Ala Gly Pro Gly 
1905 1910 1915 1920 

Val Ala Ala Leu Ala Gly Leu Pro Val Gly Glu Arg Glu Lys Ala Val 

1925 1930 1935 

Leu Asp Leu Val Arg Thr His Ala Ala Ala Val Leu Gly His Ala Ser 

1940 1945 1950 

Ala Glu Gin Val Pro Val Asp Arg Ala Phe Ala Glu Leu Gly Val Asp. 

1955 I960 1965 

Ser Leu Ser Ala Leu Glu Leu Arg Asn Arg Leu Thr Thr Ala Thr Gly 

1970 1975 1980. 

Val Arg Leu Ala Thr Thr Thr Val Phe Asp His Pro Asp Val Arg Thr 
1985 1990 1995 2000 

Leu Ala Gly His Leu Ala Ala Glu Leu Gly Gly Gly Ser Gly Arg Glu 

2005 2010 2015 

Arg Pro Gly Gly Glu Ala Pro Thr Val Ala Pro Thr Asp Glu Pro He 

2020 2025 2030 

Ala He Val Gly Met Ala Cys Arg Leu Pro Gly Gly Val Asp Ser Pro 

2035 2040 2045 

Glu Gin- Leu Trp Glu Leu He Val Ser Gly Arg Asp Thr Ala Ser Ala 
2050 2055 2060 
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Ala Pro Gly Asp Arg Ser Trp Asp Pro Ala Glu Leu Met Val Ser Asp 
2065 2070 2075 2080 

Thr Thr Gly Thr Arg Thr Ala Phe Gly Asn Phe Met Pro Gly Ala Gly 

2085 2090 2095 

Glu Phe Asp Ala Ala Phe Phe Gly He Ser Pro Arg Glu Ala Leu Ala 

2100 2105 2110 

Met Asp Pro Gin Gin Arg His Ala Leu Glu Thr Thr Trp Glu Ala Leu 

2115 2120 2125 

Glu Asn Ala Gly lie Arg Pro Glu Ser Leu Arg Gly Thr Asp Thr Gly 

2130 2135 2140 

Val Phe Val Gly Met Ser His Gin Gly Tyr Ala Thr Gly Arg Pro Lys 
2145 2150 2155 2160 

Pro Glu Asp Glu Val Asp Gly Tyr Leu Leu Thr Gly Asn Thr Ala Ser 

2165 2170 2175 

Val Ala Ser Gly Arg He Ala Tyr Val Leu Gly Leu Glu Gly Pro Ala 

2180 2185 2190 

He Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Leu His Val 

2195 2200 2205 

Ala Ala Gly Ser Leu Arg Ser Gly Asp Cys Gly Leu Ala Val Ala Gly 

2210 2215 2220 

Gly Val Ser Val Met Ala Gly Pro Glu Val Phe Arg Glu Phe Ser Arg 
2225 2230 2235 - 2240 

Gin Gly Ala Leu Ala Pro Asp Gly Arg Cys Lys Pro Phe Ser Asp Glu 

2245 2250 2255 

Ala Asp Gly Phe Gly Leu Gly Glu Gly Ser Ala Phe Val Val Leu Gin 

2260 2265 2270 

Arg Leu Ser Val Ala Val Arg Glu Gly Arg Arg Val Leu Gly Val Val 

2275 2280 2285 

Val Gly Ser Ala Val Asn Gin Asp Gly Ala Ser Asn Gly Leu Ala Ala 

2290 2295 2300 

Pro Ser Gly Val Ala Gin Gin Arg Val lie Arg Arg Ala Trp Gly Arg 
2305 ' 2310 2315 2320 

Ala Gly Val Ser Gly Gly Asp Val Gly Val Val Glu Ala His Gly Thr 

2325 2330 2335 

Gly Thr Arg Leu Gly Asp Pro Val Glu Leu Gly Ala Leu Leu Gly Thr 

2340 2345 2350 

Tyr Gly Val Gly Arg Gly Gly Val Gly Pro Val Val Val Gly Ser Val 

2355 2360 2365 

Lys Ala Asn Val Gly His Val Gin Ala Ala Ala Gly Val Val Gly Val * 

2370 2375 2380 

He Lys Val Val Leu Gly Leu Gly Arg Gly Leu Val Gly Pro Met Val 
2385 2390 2395 2400 

Cys Arg Gly Gly Leu Ser Gly Leu Val Asp Trp Ser Ser Gly Gly Leu 

2405 2410 2415 

Val Val Ala Asp Gly Val Arg Gly Trp Pro Val Gly Val Asp Gly Val 

2420 2425 2430 

Arg Arg Gly Gly Val Ser Ala Phe Gly Val Ser Gly Thr Asn Ala His 

2435 2440 2445 

Val Val Val Ala Glu Ala Pro Gly Ser Val Val Gly Ala Glu Arg Pro 

2450 2455 2460 

Val Glu Gly Ser Ser Arg Gly Leu Val Gly Val Val Gly Gly Val Val 
2465 2470 2475 2480 

Pro Val Val Leu Ser Ala Lys Thr Glu Thr Ala Leu His Ala Gin Ala 

2485 2490 2495 

Arg Arg Leu Ala Asp His Leu Glu Thr His Pro Asp Val Pro Met Thr 

2500 2505 2510 

Asp Val Val Trp Thr Leu Thr Gin Ala Arg Gin Arg Phe Asp Arg Arg 

2515 2520 2525 

Ala Val Leu Leu Ala Ala Asp Arg Thr Gin Ala Val Glu Arg Leu Arq 

2530 2535 2540 

Gly Leu Ala Gly Gly Glu Pro Gly Thr Gly Val Val Ser Gly Val Ala 
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2545 2550 2555 2560 

Ser Gly Gly Gly Val Val Phe Val Phe Pro Gly Gin Gly Gly Gin Trp 

2565 2570 2575 

Val Gly Met Ala Arg Gly Leu Leu Ser Val Pro Val Phe Val Glu Ser 

2580 2585 2590 

Val Val Glu Cys Asp Ala Val Val Ser Ser Val Val Gly Phe Ser Val 

2595 2600 2605 

Leu Gly Val Leu Glu Gly Arg Ser Gly Ala Pro Ser Leu Asp Arg Val 

2610 - 2615 2620 

Asp Val Val Gin Pro Val Leu Phe Val Val Met Val Ser Leu Ala Arg 
2625 2630 2635 2640 

Leu Trp Arg Trp Cys Gly Val Val Pro Ala Ala Val Val Gly His Ser 

2645 2650 2655 

Gin Gly Glu lie Ala Ala Ala Val Val Ala Gly Val Leu Ser Val Gly 

2660 2665 2670 

Asp Gly Ala Arg Val Val Ala Leu Arg Ala Arg Ala Leu Arg Ala Leu 

2675 2680 2685 

Ala Gly His Gly Gly Met Ala Ser Val Arg Arg Gly Arg Asp Asp Val 

2690 2695 2700 

Gin Lys Leu Leu Asp Ser Gly Pro Trp Thr Gly Lys Leu Glu He Ala 
2705 2710 2715 2720 

Ala Val Asn Gly Pro Asp Ala Val Val Val Ser Gly Asp Pro Arg Ala 

2725 2730 2735 

Val Thr Glu Leu Val Glu His Cys Asp Gly He Gly Val Arg Ala Arg 

2740 2745 2750 

Thr He Pro Val Asp Tyr Ala Ser His Ser Ala Gin Val Glu Ser Leu 

2755 2760 2765 

Arg Glu Glu Leu Leu Ser Val Leu Ala Gly He Glu Gly Arg Pro Ala 

2770 2775 2780 

Thr Val Pro Phe Tyr Ser Thr Leu Thr Gly Gly Phe Val Asp Gly Thr 
2785 2790 2795 2800 

Glu Leu Asp' Ala Asp Tyr Trp Tyr Arg Asn Leu Arg His Pro Val Arg 

2805 2810 2815 

Phe His Ala Ala Val Glu Ala Leu Ala Ala Arg Asp Leu Thr Thr Phe 

2820 2825 2830 

Val Glu Val Ser Pro His Pro Val Leu Ser Met Ala Val Gly Glu Thr 

2835 2840 2845 

Leu Ala Asp Val Glu Ser Ala Val Thr Val Gly Thr Leu Glu Arg Asp 

2850 2855 2860 

Thr Asp Asp Val Glu Arg Phe Leu Thr Ser Leu Ala Glu Ala His Val 
2865 2870 2875 2880 

His Gly Val Pro Val Asp Trp Ala Ala Val Leu Gly Ser Gly Thr Leu 

2885 2890 2895 

Val Asp Leu Pro Thr Tyr Pro Phe Gin Gly Arg Arg Phe Trp Leu His 

2900 2905 2910 

Pro Asp Arg Gly Pro Arg Asp Asp Val Ala Asp Trp Phe His Arg Val 

2915 2920 2925 

Asp Trp Thr Ala Thr Ala Thr Asp Gly Ser Ala Arg Leu Asp Gly Arg 

2930 2935 2940 

Trp Leu Val Val Val Pro Glu Gly Tyr Thr Asp Asp Gly Trp Val Val 
2945 2950 2955 2960 

Glu Val Arg Ala Ala Leu Ala Ala Gly Gly Ala Glu Pro Val Val Thr 

2965 2970 2975 

Thr Val Glu Glu Val Thr Asp Arg Val Gly Asp Ser Asp Ala Val Val 

2980 2985 2990 

Ser Met Leu Gly Leu Ala Asp Asp Gly Ala Ala Glu Thr Leu Ala Leu 

2995 3000 3005 

Leu Arg Arg Leu Asp Ala Gin Ala Ser Thr Thr Pro Leu Trp Val Val 

3010 3015 3020 

Thr Val Gly Ala Val Ala Pro Ala Gly Pro Val Gin Arg Pro Glu Gin 
3025 3030 3035 3040 
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Ala Thr Val Trp Gly Leu Ala Leu Val Ala Ser Leu Glu Arg Gly His 

3045 3050 3055 

Arg Trp Thr Gly Leu Leu Asp Leu Pro Gin Thr Pro Asp Pro Gin Leu 

3060 3065 3070 

Arg Pro Arg Leu Val Glu Ala Leu Ala Gly Ala Glu Asp Gin Val Ala 

3075 3080 3085 

Val Arg Ala Asp Ala Val His Ala Arg Arg lie Val Pro Thr Pro Val 

3090 3095 3100 

Thr Gly Ala Gly Pro Tyr Thr Ala Pro Gly Gly Thr lie Leu Val Thr 
3105 3110 3115 3120 

Gly Gly Thr Ala Gly Leu Gly Ala Val Thr Ala Arg Trp Leu Ala Glu 

3125 3130 3135 

Arg Gly Ala Glu His Leu Ala Leu Val Ser Arg Arg Gly Pro Gly Thr 

3140 3145 3150 

Ala Gly Val Asp Glu Val Val Arg Asp Leu Thr Gly Leu Gly Val Arg 

3155 3160 3165 

Val Ser Val His Ser Cys Asp Val Gly Asp Arg Glu Ser Val Gly Ala 

3170 3175 3180 

Leu Val Gin Glu Leu Thr Ala Ala Gly Asp Val Val Arg Gly Val Val 
3185 3190 3195 3200 

His Ala Ala Gly Leu Pro Gin Gin Val Pro Leu Thr Asp Met Asp Pro 

3205 3210 " 3215 

Ala Asp Leu Ala Asp Val Val Ala Val Lys Val Asp Gly Ala Val His 

3220 3225 3230 

Leu Ala Asp Leu Cys Pro Glu Ala Glu Leu Phe Leu Leu Phe Ser Ser 
3235 3240 3245 

■Gly Ala Gly Val Trp Gly Ser Ala Arg Gin Gly Ala Tyr Ala Ala Gly 

3250 3255 3260 

Asn Ala Phe Leu Asp Ala Phe Ala Arg His Arg Arg Asp Arg Gly Leu 
3265 3270 3275 3280 

Pro Ala Thr Ser Val Ala Trp Gly Leu Trp Ala Ala Gly Gly Met Thr 

3285 3290 3295 

Gly Asp Gin Glu Ala Val Ser Phe Leu Arg Glu Arg Gly Val Arg Pro 

3300 3305 3310 

Met Ser Val Pro Arg Ala Leu Glu Ala Leu Glu Arg Val Leu Thr Ala 

3315 3320 3325 

Gly Glu Thr Ala Val Val Val Ala Asp Val Asp Trp Ala Ala Phe Ala 

3330 3335 3340 

Glu Ser Tyr Thr Ser Ala Arg Pro Arg Pro Leu Leu His Arg Leu Val 
3345 3350 3355 3360 

Thr Pro Ala Ala Ala Val Gly Glu Arg Asp Glu Pro Arg Glu Gin Thr 

3365 3370 3375 

Leu Arg Asp Arg Leu Ala Ala Leu Pro Arg Ala Glu Arg Ser Ala Glu 

3380 3385 3390 

Leu Val Arg Leu Val Arg Arg Asp Ala Ala Ala Val Leu Gly Ser Asp 

3395 3400 3405 

Ala Lys Ala Val Pro Ala Thr Thr Pro Phe Lys Asp Leu Gly Phe Asp 

3410 3415 3420 

Ser Leu Ala Ala Val Arg Phe Arg Asn Arg Leu Ala Ala His Thr Gly 
3425 3430 3435 3440 

Leu Arg Leu Pro Ala Thr Leu Val Phe Glu His Pro Asn Ala Ala Ala 

3445 3450 3455 

Val Ala Asp Leu Leu His Asp Arg Leu Gly Glu Ala Gly Glu Pro Thr 

3460 3465 3470 

Pro Val Arg Ser Val Gly Ala Gly Leu Ala Ala Leu Glu Gin Ala Leu 

3475 3480 3485 

Pro Asp Ala Ser Asp Thr Glu Arg Val Glu Leu Val Glu Arg Leu Glu 

3490 3495 3500 

Arg Met Leu Ala Gly Leu Arg Pro Glu Ala Gly Ala Gly Ala Asp Ala 
3505 3510 3515 3520 

Pro Thr Ala Gly Asp Asp Leu Gly Glu Ala Gly Val Asp Glu Leu Leu 
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3525 3530 3535 

Asp Ala Leu Glu Arg Glu Leu Asp Ala Arg 

3540 3545 

<210> 14 
<211> 3562 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 14 

Met Thr Asp Asn Asp Lys Val Ala Glu Tyr Leu Arg Arg Ala Thr Leu 

1-5 10 15 

Asp Leu Arg Ala Ala Arg Lys Arg Leu Arg Glu Leu Gin Ser Asp Pro 

20 25 30 

lie Ala Val Val Gly Met Ala Cys Arg Leu Pro Gly Gly Val His Leu 

35 40 45 

Pro Gin His Leu Trp Asp Leu Leu Arg Gin Gly His Glu Thr Val Ser 

50 55 60 

Thr Phe Pro Thr Gly Arg Gly Trp Asp Leu Ala. Gly Leu Phe His Pro 
65 70 75 80 

Asp Pro Asp His Pro Gly Thr Ser Tyr Val Asp Arg Gly Gly Phe Leu 

85 90 - 95 

Asp Asp Val Ala Gly Phe Asp Ala Glu Phe Phe Gly lie Ser Pro Arg 

100 105 110 

Glu Ala Thr Ala Met Asp Pro Gin Gin Arg Leu Leu Leu Glu Thr Ser 

115 120 125 

Trp Glu Leu Val Glu Ser Ala Gly lie Asp Pro His Ser Leu Arg Gly 

130 135 140 

Thr Pro Thr Gly Val Phe Leu Gly Val Ala Arg Leu Gly Tyr Gly Glu 
145 150 155 160 

Asn Gly Thr Glu Ala Gly Asp Ala Glu Gly Tyr Ser Val Thr Gly Val 

165 170 175 

Ala Pro Ala Val Ala Ser Gly Arg lie Ser Tyr Ala Leu Gly Leu Glu 

180 185 190 

Gly Pro Ser lie Ser Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala 

195 200 205 

Leu His Leu Ala Val Glu Ser Leu Arg Leu Gly Glu Ser Ser Leu Ala 

210 215 220 

Val Val Gly Gly Ala Ala Val Met Ala Thr Pro Gly Val Phe Val Asp 
225 230 235 240 

Phe Ser Arg Gin Arg Ala Leu Ala Ala Asp Gly Arg Ser Lys Ala Phe 

245 250 255 

Gly Ala Ala Ala Asp Gly Phe Gly Phe Ser Glu Gly Val Ser Leu Val 

260 265 270 

Leu Leu Glu Arg Leu- Ser Glu Ala Glu Ser Asn Gly His Glu Val Leu 

275 280 285 

Ala Val lie Arg Gly Ser Ala Leu Asn Gin Asp Gly Ala Ser Asn Gly 

290 295 300 

Leu Ala Ala Pro Asn Gly Thr Ala Gin Arg Lys Val lie Arg Gin Ala 
305 310 315 320 

Leu Arg Asn Cys Gly Leu Thr Pro Ala Asp Val Asp Ala Val Glu Ala 

325 330 335 

His Gly Thr Gly Thr Thr Leu Gly Asp Pro lie Glu Ala Asn Ala Leu 

340 345 350 

Leu Asp Thr Tyr Gly Arg Asp Arg Asp Pro Asp His Pro Leu Trp Leu 

355 360 365 

Gly Ser Val Lys Ser Asn lie Gly His Thr Gin Ala Ala Ala Gly Val 

370 ' 375 380 

Thr Gly Leu Leu Lys Met Val Leu Ala Leu Arg His Glu Glu Leu Pro 
385 390 395 400 

Ala Thr Leu His Val Asp Glu Pro Thr Pro His Val Asp Trp Ser Ser 
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405 

Gly Ala Val Arg Leu 

420 

Arg Pro Arg Arg Ala 
435 

Ala His Val lie Val 
450 

Val Gly Gly Asp Val 
465 

Ala Ala Ala Leu Arg 

485 

Gly Ser Asp Val Gly 

500 

Arg Ala Arg His Glu 
515 

Glu Ala Val Arg Gly 
530 

Glu Asp Thr Val Thr 
545 

Phe Leu Phe Pro Gly 

565 

Leu Leu Asp Ser Ala 

580 

Glu Ala Met Ala Pro 
595 

Gin Glu Pro Gly Ala 
610 

Val Leu Phe Ala Val 
625 

Gly Val Thr Pro Ala 

645 

Ala Ala His Val Ala 

660 

Val Val Gly Arg Ser 
675 

Met Ser Ala Val Ala 
690 

Ser Trp Glu Asp Arg 
705 

Val Val Val Ala Gly 

725 

Arg Glu Ala Glu Gly 

740 

Ser His Ser Pro Gin 
755 

Thr Gly Glu He Glu 
770 

Val Asp Val Arg Ala 
785 

Tyr Arg Asn Leu Arg 

805 

Leu Ala Asp Ser Gly 

820 

Val Val Val Ser Ala 
835 

Asp Ala Val Val Val 
850 

Ala Phe Leu Arg Ser 
865 

Asp Trp Thr Pro Ala 

885 



Ala 


Thr 


Arg 


Gly 


Arg 


Pro 


Trp 








425 








Gly 


Val 


Ser 


Ala 


Phe 


Gly 


He 






440 










Glu 


Glu 


Ala 


Pro 


Glu 


Arg 


Thr 




455 










4 60 


Gly 


Pro 


Val 


Pro 


Leu 


Val 


Val 


470 










475 




Ala 


Gin 


Ala 


Ala 


Gin 


Val 


Ala 










490 






Leu 


Ala 


Glu 


Val 


Gly 


Arg 


Ser 








505 








His 


Arg 


Ala 


Ala 


Val 


Val 


Ala 






520 










Leu 


Arg 


Glu 


Val 


Ala 


Ala 


Val 




535 










540 


Gly 


Val 


Ala 


Glu 


Thr 


Ser 


Gly 


550 










555 




Gin 


Gly 


Ser 


Gin 


Trp 


Val 


Gly 










570 






Pro 


Ala 


Phe 


Ala 


Asp 


Thr 


He 








585 








Leu 


Gin 


Asp 


Trp 


Ser 


Val 


Ser 






600 










Pro 


Gly 


Leu 


Asp 


Arg 


Val 


Asp 




615 










620 


Met 


Val 


Ser 


Leu 


Ala 


Arg 


Leu 


630 










635 




Ala 


Val 


Val 


Gly 


His 


Ser 


Gin 










650 






Gly 


Ala 


Leu 


Ser 


Leu 


Ala 


Asp 








665 








Arg 


Leu 


Leu 


Arg 


Ser 


Leu 


Ser 






680 










Leu 


Gly 


Glu 


Ala 


Glu 


Val 


Arg 




695 










700 


He 


Ser 


Val 


Ala 


Ala 


Val 


Asn 


710 










715 




Glu 


Pro 


Glu 


Ala 


Leu 


Arg 


Glu 










730 






Val 


Arg 


Val 


Arg 


Glu 


He 


Asp 








745 








He 


Asp 


Arg 


Val 


Arg 


Asp 


Glu 






760 










Pro 


Arg 


Ser 


Ala 


Glu 


He 


Thr 




775 










780 


Val 


Asp 


Gly 


Thr 


Asp 


Leu 


Asp 


790 










795 




Glu 


Thr 


val 


Arg 


Phe 


Ala 


Asp 










810 






Tyr 


Asp 


Ala 


Phe 


Val 


Glu 


Val 








825 








Val 


Ala 


Glu 


Ala 


Val 


Glu 


Glu 






840 










Gly 


Thr 


Leu 


Ser 


Arg 


Gly 


Asp 




855 










860 


Ala 


Ala 


Thr 


Ala 


His 


Cys 


Ala 


870 










875 




Leu 


Pro 


Gly 


Ala 


Ala 


Thr 


He 



890 
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415 




Arg 


Arg 


Gly 


Asp 




430 






Ser 


Gly 


Thr 


Asn 


445 








Thr 


Glu 


Arg 


Thr 


Ser 


Ala 


Arg 


Ser 








480 


Glu 


Leu 


Val 


Glu 






495 




Leu 


Ala 


Val 


Thr 




510 






Ser 


Thr 


Arq 


Ala 


525 








Glu 


Pro 


Arq 


Glv 


Arg 


Thr 


Val 


Val 








560 


Met 


Gly 


Ala 


Glu 






575 




Arg 


Ala 


Cvs 


Asp 




590 






Asp 


Val 


Leu 


Arg 


605 








Val 


Val 


Gin 


Pro 


Trp 


Gin 


Ser 


Tyr 








640 


Gly 


Glu 


lie 


Ala 






655 




Ala 


Ala 


Arg 


Leu 




670 






Gly 


Gly 


Gly 


Glv 


685 








Arg 


Arq 


Leu 


Arg 


Gly 


Pro 


Arg 


Ser 








720 


Trp 


Gly 


Arg 


Glu 






735 




Val 


Asp 


Tyr 


Ala 




750 






Leu 


Leu 


Thr 


Val 


765 








Phe 


Tyr 


Ser 


Thr 


Ala 


Gly 


Tyr 


Trp 








800 


Ala 


Met 


Thr 


Arg 






815 




Ser 


Pro 


His 


Pro 




830 






Ala 


Gly 


Val 


Glu 


845 








Gly 


Gly 


Pro 


Gly 


Gly 


Val 


Asp 


Val 








880 


Pro 


Leu 


Pro 


Thr 
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Thr Gly 


Leu 


Ala 


Glu 


Ala 


Leu 


Ala 


Arg 


7\ 

t\L g 


Asp 


Gly 


Thr 


Phe 


/\rg 


Caxy 




O O f\ 

980 










985 










990 






Val Leu 


Ser 


Trp 


Val 


Ala 


Thr 


Asp 


Glu 


Arg 


His 


Val 


Glu 


Ala 


Gly 


Ala 




995 










1000 








1005 






Val Ala 


Leu 


Leu 


Thr 


Leu 


Ala 


Gin 


Ala 


Leu 


Gly 


Asp 


Ala 


Gly 


He 


Asp 


1010 








1015 








1020 








Ala Pro 


Leu 


Trp 


Cys 


Leu 


Thr 


Gin 


Glu 


Ala 


Val 


Arg 


Thr 


Pro 


Val 


Asp 


1025 








1030 








1035 








1040 


Gly Asp 


Leu 


Ala 


Arg 


Pro 


Ala 


Gin 


Ala 


Ala 


Leu 


His 


Gly 


Phe 


Ala 


Gin 






1045 








1050 








1055 


Val Ala 


Arg 


Leu 


Glu 


Leu 


Ala 


Arg 


Arg 


Phe 


Gly 


Gly 


Val 


Leu 


Asp 


Leu 






1060 








1065 








1070 




Pro Ala 


Thr 


Val 


Asp 


Ala 


Ala 


Gly 


Thr 


Arg 


Leu 


Val 


Ala 


Ala 


Val 


Leu 




1075 








1080 








1085 






Ala Gly 


Gly 


Gly 


Glu 


Asp 


Val 


Val 


Ala 


Val 


Arg 


Gly 


Asp 


Arg 


Leu 


Tyr 


1090 








1095 








1100 








Gly Arg 


Arg 


Leu 


Val 


Arg 


Ala 


Thr 


Leu 


Pro 


Pro 


Pro 


Gly 


Gly 


Gly 


Phe 


1105 








1110 








1115 








1120 


Thr Pro 


His 


Gly 


Thr 


Val 


Leu 


Val 


Thr 


Gly 


Ala 


Ala 


Gly 


Pro 


Val 


Gly 








1125 








1130 








1135 


Gly Arg 


Leu 


Ala 


Arg 


Trp 


Leu 


Ala 


Glu 


Arg 


Gly 


Ala 


Thr 


Arg 


Leu 


Val 




■ 1140 








1145 








1150 




Leu Pro 


Gly 


Ala 


His 


Pro 


Gly 


Glu 


Glu 


Leu 


Leu 


Thr 


Ala 


He 


Arg 


Ala 




1155 








1160 








1165 






Ala Gly 


Ala 


Thr 


Ala 


Val 


Val 


Cys 


Glu 


Pro 


Glu 


Ala 


Glu 


Ala 


Leu 


Arg 


1170 








1175 








1180 








Thr Ala 


lie 


Gly 


Gly 


Glu 


Leu 


Pro 


Thr 


Ala 


Leu 


Val 


His 


Ala 


Glu 


Thr 


1185 








1190 








1195 








1200 


Leu Thr 


Asn 


Phe 


Ala 


Gly 


Val 


Ala 


Asp 


Ala 


Asp 


Pro 


Glu 


Asp 


Phe 


Ala 








1205 








1210 








1215 


Ala Thr 


Val 


Ala 


Ala 


Lys 


Thr 


Ala 


Leu 


Pro 


Thr 


Val 


Leu 


Ala 


Glu 


Val 






1220 








1225 








1230 




Leu Gly 


Asp 


His 


Arg 


Leu 


Glu 


Arg 


Glu 


Val 


Tyr 


Cys 


Ser 


Ser 


Val 


Ala 




1235 








1240 








1245 






Gly Val 


Trp 


Gly 


Gly 


Val 


Gly 


Met 


Ala 


Ala 


Tyr 


Ala 


Ala 


Gly 


Ser 


Ala 


1250 








1255 








1260 








Tyr Leu 


Asp 


Ala 


Leu 


Val 


Glu 


His 


Arg 


Arg 


Ala 


Arg 


Gly 


His 


Ala 


Ser 


1265 








1270 








1275 








1280 


Ala Ser 


Val 


Ala 


Trp 


Thr 


Pro 


Trp 


Ala 


Leu 


Pro 


Gly 


Ala 


Val 


Asp 


Asp 








1285 








1290 








1295 


Gly Arg 


Leu 


Arg 


Glu 


Arg 


Gly 


Leu 


Arg 


Ser 


Leu 


Asp 


Val 


Ala 


Asp 


Ala 






1300 








1305 








1310 




Leu Gly Thr Trp Glu Arg 


Leu 


Leu 


Arg 


Ala 


Gly 


Ala 


Val 


Ser 


Val 


Ala 




1315 








1320 








1325 






Val Ala 


Asp 


Val 


Asp 


Trp 


Ser 


Val 


Phe 


Thr 


Glu 


Gly 


Phe 


Ala 


Ala 


He 


1330 








1335 








1340 








Arg Pro 


Thr 


Pro 


Leu 


Phe 


Asp 


Glu 


Leu 


Leu 


Asp 


Arg 


Arg 


Gly 


Asp 


Pro 


1345 








1350 








1355 








136C 



Asp Gly Ala Pro Val Asp Arg Pro Gly Glu Pro Ala Gly Glu Trp Gly 

1365 1370 1375 



Arg Arg He Ala Ala Leu Ser Pro Gin Glu Gin Arg Glu Thr Leu Leu 
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1380 1385 1390 

Thr Leu Val Gly Glu Thr Val Ala Glu Val Leu Gly His Glu Thr Gly 

1395 1400 1405 

Thr Glu He Asn Thr Arg Arg Ala Phe Ser Glu Leu Gly Leu Asp Ser 

1410 1415 1420 

Leu Gly Ser Met Ala Leu Arg Gin Arg Leu Ala Ala Arg Thr Gly Leu 
1425 1430 1435 1440 

Arg Met Pro Ala Ser Leu Val Phe Asp His Pro Thr Val Thr Ala Leu 

1445 - 1450 1455 

Ala Arg Tyr Leu Arg Arg Leu Val Val Gly Asp Ser Asp Pro Thr Pro 

1460 1465 1470 

Val Arg Val Phe Gly Pro Thr Asp Glu Ala Glu Pro Val Ala Val Val 

1475 1480 1485 

Gly He Gly Cys Arg Phe Pro Gly Gly He Ala Thr Pro Glu Asp Leu 

1490 1495 1500 

Trp Arg Val Val Ser Glu Gly Thr Ser lie Thr Thr Gly Phe Pro Thr 
1505 1510 1515 1520 

Asp Arg Gly Trp Asp Leu Arg Arg Leu Tyr His Pro Asp Pro Asp His 

1525 1530 1535 

Pro Gly Thr Ser Tyr Val Asp Arg Gly Gly Phe Leu Asp Gly Ala Pro 

1540 1545 1550 

Asp Phe Asp Pro Gly Phe Phe Gly He Thr Pro Arg Glu Ala Leu Ala 

1555 1560 1565 

Met Asp Pro Gin Gin Arg Leu Thr Leu Glu He Ala Trp Glu Ala Val 

1570 1575 1580 

Glu Arg Ala Gly He Asp Pro Glu Thr Leu Leu Gly Ser Asp Thr Gly 
1585 1590 1595 1600 

Val Phe Val Gly Met Asn Gly Gin Ser Tyr Leu Gin Leu Leu Thr Gly 

1605 1610 1615 

Glu Gly Asp Arg Leu Asn Gly Tyr Gin Gly Leu Gly Asn Ser Ala Ser 

1620 1625 1630 

Val Leu Ser' Gly Arg Val Ala Tyr Thr Phe Gly Trp Glu Gly Pro Ala 

1635 1640 1645 

Leu Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala He His Leu 

1650 1655 1660 

Ala Met Gin Ser Leu Arg Arg Gly Glu Cys Ser Leu Ala Leu Ala Gly 
1665 1670 1675 1680 

Gly Val Thr Val Met Ala Asp Pro Tyr Thr Phe Val Asp Phe Ser Ala 

1685 1690 1695 

Gin Arg Gly Leu Ala Ala Asp Gly Arg Cys Lys Ala Phe Ser Ala Gin 

1700 1705 1710 

Ala Asp Gly Phe Ala Leu Ala Glu Gly Val Ala Ala Leu Val Leu Glu 

1715 1720 1725 

Pro Leu Ser Lys Ala Arg Arg Asn Gly His Gin Val Leu Ala Val Leu 

1730 1735 1740 

Arg Gly Ser Ala Val Asn Gin Asp Gly Ala Ser Asn Gly Leu Ala Ala 
1745 1750 1755 1760 

Pro Asn Gly Pro Ser Gin Glu Arg Val He Arg Gin Ala Leu Thr Ala 

1765 1770 1775 

Ser Gly Leu Arg Pro Ala Asp Val Asp Met Val Glu Ala His Gly Thr 

1780 1785 1790 

Gly Thr Glu Leu Gly Asp Pro He Glu Ala Gly Ala Leu He Ala Ala 

1795 1800 1805 

Tyr Gly Arg Asp Arg Asp Arg Pro Leu Trp Leu Gly Ser Val Lys Thr 

1810 1815 1820 

Asn He Gly His Thr Gin Ala Ala Ala Gly Ala Ala Gly Val He Lys 
1825 1830 1835 1840 

Ala Val Leu Ala Met Arg His Gly Val Leu Pro Arg Ser Leu His Ala 

1845 1850 1855 

Asp Glu Leu Ser Pro His He Asp Trp Ala Asp Gly Lys Val Glu Val 

1860 1865 1870 
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Leu Arg Glu Ala Arg Gin Trp Pro Pro Gly Glu Arg Pro Arg Arg Ala 

1875 1880 1885 

Gly Val Ser Ser Phe Gly Val Ser Gly Thr Asn Ala His Val lie Val 

1890 1895 1900 

Glu Glu Ala Pro Ala Glu Pro Asp Pro Glu Pro Val Pro Ala Ala Pro 
1905 1910 1915 1920 

Gly Gly Pro Leu Pro Phe Val Leu His Gly Arg Ser Val Gin Thr Val 

1925 1930 1935 

Arg Ser Gin Ala Arg Thr Leu Ala Glu His Leu Arg Thr Thr Gly His 

1940 1945. .1950 

Arg Asp Leu Ala Asp Thr Ala Arg Thr Leu Ala Thr Gly Arg Ala Arg 

1955 I960 1965 

Phe Asp Val Arg Ala Ala Val Leu Gly Thr Asp Arg Glu Gly Val Cys 

1970 1975 1980 

Ala Ala Leu Asp Ala Leu Ala Gin Asp Arg Pro Ser Pro Asp Val Val 
1985 1990 1995 2000 

Ala Pro Ala Val Phe Ala Ala Arg Thr Pro Val Leu Val Phe Pro Gly 

2005 2010 2015 

Gin Gly Ser Gin Trp Val Gly Met Ala Arg Asp Leu Leu Asp Ser Ser 

2020 2025 2030 

Glu Val Phe Ala Glu Ser Met Gly Arg Cys Ala Glu Ala Leu Ser Pro 

2035 2040 2045- 

Tyr Thr Asp Trp Asp Leu Leu Asp Val Val Arg Gly Val Gly Asp Pro 

2050 2055 2060 

Asp Pro Tyr Asp Arg Val Asp Val Leu Gin Pro Val Leu Phe Ala Val 
2065 2070 2075 2080 

Met Val Ser Leu Ala Arg Leu Trp Gin Ser Tyr Gly Val Thr Pro Gly 

2085 2090 2095 

Ala Val Val Gly His Ser Gin Gly Glu lie Ala Ala Ala His Val Ala 

2100 2105 2110 

Gly Ala Leu Ser Leu Ala Asp Ala Ala Arg Val Val Ala Leu Arg Ser 

2115 2120 2125 

Arg Val Leu Arg Glu Leu Asp Asp Gin Gly Gly Met Val Ser Val Gly 

2130 2135 2140 

Thr Ser Arg Ala Glu Leu Asp Ser Val Leu Arg Arg Trp Asp Gly Arg 
2145 2150 2155 2160 

Val Ala Val Ala Ala Val Asn Gly Pro Gly Thr Leu Val Val Ala Gly 

2165 2170 2175 

Pro Thr Ala Glu Leu Asp Glu Phe Leu Ala Val Ala Glu Ala Arg Glu 

2180 2185 2190 

Met Arg Pro Arg Arg He Ala Val Arg Tyr Ala Ser His Ser Pro Glu 

2195 2200 2205 

Val Ala Arg Val Glu Gin Arg Leu Ala Ala Glu Leu Gly Thr Val Thr 

2210 2215 2220 

Ala Val Gly Gly Thr Val Pro Leu Tyr Ser Thr Ala Thr Gly Asp Leu 
2225 2230 2235 2240 

Leu Asp Thr Thr Ala Met Asp Ala Gly Tyr Trp Tyr Arg Asn Leu Arg 

2245 2250 2255 

Gin Pro Val Leu Phe Glu His Ala Val Arg Ser Leu Leu Glu Arg Gly 

2260 2265 2270 

Phe Glu Thr Phe He Glu Val Ser Pro His Pro Val Leu Leu Met Ala 

2275 2280 2285 

Val Glu Glu Thr Ala Glu Asp Ala Glu Arg Pro Val Thr Gly Val Pro 

2290 2295 2300 

Thr Leu Arg Arg Asp His Asp Gly Pro Ser Glu Phe Leu Arg Asn Leu 
2305 2310 2315 2320 

Leu Gly Ala His Val His Gly Val Asp Val Asp Leu Arg Pro Ala Val 

2325 2330 2335 

Ala His Gly Arg Leu Val Asp Leu Pro Thr Tyr Pro Phe Asp Arg Gin 

2340 2345 2350 

Arg Leu Trp Pro Lys Pro His Arg Arg Ala Asp Thr Ser Ser Leu Gly 
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2355 2360 2365 

Val Arg Asp Ser Thr His Pro Leu Leu His Ala Ala Val Asp Val Pro 

2370 2375 2380 

Gly His Gly Gly Ala Val Phe Thr Gly Arg Leu Ser Pro Asp Glu Gin 
2385 2390 2395 2400 

Gin Trp Leu Thr Gin His Val Val Gly Gly Arg Asn Leu Val Pro Gly 

2405 2410 2415 

Ser Val Leu Val Asp Leu Ala Leu Thr Ala Gly Ala Asp Val Gly Val 

2420 2425 2430 

Pro Val Leu Glu Glu Leu Val Leu Gin Gin Pro Leu Val Leu Thr Ala 

2435 2440 2445 

Ala Gly Ala Leu Leu Arg Leu Ser Val Gly Ala Ala Asp Glu Asp Gly 

2450 2455 2460 

Arg Arg Pro Val Glu lie His Ala Ala Glu Asp Val Ser Asp Pro Ala 
2465 2470 2475 2480 

Glu Ala Arg Trp Ser Ala Tyr Ala Thr Gly Thr Leu Ala Val Gly Val 

2485 2490 2495 

Ala Gly Gly Gly Arg Asp Gly Thr Gin Trp Pro Pro Pro Gly Ala Thr 

2500 2505 2510 

Ala Leu Thr Leu Thr Asp His Tyr Asp Thr Leu Ala Glu Leu Gly Tyr 

2515 2520 2525 

Glu Tyr Gly Pro Ala Phe Gin Ala Leu Arg Ala Ala Trp Gin His Gly 

2530 2535 2540 

Asp Val Val Tyr Ala Glu Val Ser Leu Asp Ala Val Glu Glu Gly Tyr 
2545 2550 2555 2560 

Ala Phe Asp Pro Val Leu Leu Asp Ala Val Ala Gin Thr Phe Gly Leu 

2565 2570 2575 

Thr Ser Arg Ala Pro Gly Lys Leu Pro Phe Ala Trp Arg Gly Val Thr 

2580 2585 2.590 

Leu His Ala Thr Gly Ala Thr Ala Val Arg Val Val Ala Thr Pro Ala 

2595 2600 2605 

Gly Pro Asp Ala Val Ala Leu Arg Val Thr Asp Pro Thr Gly Gin Leu 

2610 2615 2620 

Val Ala Thr Val Asp Ala Leu Val Val Arg Asp Ala Gly Ala Asp Arg 
2625 2630 2635 2640 

Asp Gin Pro Arg Gly Arg Asp Gly Asp Leu His Arg Leu Glu Trp Val 

2645 2650 2655 

Arg Leu Ala Thr Pro Asp Pro Thr Pro Ala Ala Val Val His Val Ala 

2660 2665 2670 

Ala Asp Gly Leu Asp Asp Leu Leu Arg Ala Gly Gly Pro Ala Pro Gin 

2675 2680 2685 

Ala Val Val Val Arg Tyr Arg Pro Asp Gly Asp Asp Pro Thr Ala Glu 

2690 2695 2700 

Ala Arg His Gly Val Leu Trp Ala Ala Thr Leu Val Arg Arg Trp Leu 
2705 2710 2715 2720 

Asp Asp Asp Arg Trp Pro Ala Thr Thr Leu Val Val Ala Thr Ser Ala 

2725 2730 2735 

Gly Val Glu Val Ser Pro Gly Asp Asp Val Pro Arg Pro Gly Ala Ala 

2740 2745 2750 

Ala Val Trp Gly Val Leu Arg Cys Ala Gin Ala Glu Ser Pro Asp Arg 

2755 2760 2765 

Phe Val Leu Val Asp Gly Asp Pro Glu Thr Pro Pro Ala Val Pro Asp 

2770 2775 2780 

Asn Pro Gin Leu Ala Val Arg Asp Gly Ala Val Phe Val Pro Arg Leu 
2785 2790 2795 2800 

Thr Pro Leu Ala Gly Pro Val Pro Ala Val Ala Asp Arg Ala Tyr Arg 

2805 2810 2815 

Leu Val Pro Gly Asn Gly Gly Ser lie Glu Ala Val Ala Phe Ala Pro 

2820 2825 2830 

Val Pro Asp Ala Asp Arg Pro Leu Ala Pro Glu Glu Val Arg Val Ala 
2835 2840 2845 
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Val Arg Ala Thr Gly Val Asn Phe Arg Asp Val Leu Leu Ala Leu Gly 

2850 2855 2860 

Met Tyr Pro Glu Pro Ala Glu Met Gly Thr Glu Ala Ser Gly Val Val 
2865 2870 2875 2880 

Thr Glu Val Gly Ser Gly Val Arg Arg Phe Thr Pro Gly Gin Ala Val 

2885 2890 2895 

Thr Gly Leu Phe Gin Gly Ala Phe Gly Pro Val Ala Val Ala Asp His 

2900 2905 2910 

Arg Leu Leu Thr Pro Val Pro Asp Gly Trp Arg Ala Val Asp Ala Ala 

2915 2920 2925 • 

Ala Val Pro lie Ala Phe Thr Thr Ala His Tyr Ala Leu His Asp Leu 

2930 2935 2940 

Ala Gly Leu Gin Ala Gly Gin Ser Val Leu Val His Ala Ala Ala Gly 
2945 2950 2955 2960 

Gly Val Gly Met Ala Ala Val Ala Leu Ala Arg Arg Ala Gly Ala Glu 

2965 2970 2975 

Val Phe Ala Thr Ala Ser Pro Ala Lys His Pro Thr Leu Arg Ala Leu 

2980 2985 2990 

Gly Leu Asp Asp Asp His lie Ala Ser Ser Arg Glu Ser Gly Phe Gly 

2995 3000 3005 

Glu Arg Phe Ala Ala Arg Thr Gly Gly Arg Gly Val Asp Val Val Leu 

3010 3015 3020 

Asn Ser Leu Thr Gly Asp Leu Leu Asp Glu Ser Ala Arg Leu Leu Ala 
3025 3030 3035 3040 

Asp Gly Gly Val Phe Val Glu Met Gly Lys Thr Asp Leu Arg Pro Ala 

3045 3050 3055 

Glu Gin Phe Arg Gly Arg Tyr Val Pro Phe Asp Leu Ala Glu Ala Gly 

3060 3065 3070 

Pro Asp Arg Leu Gly Glu lie Leu Glu Glu Val Val Gly Leu Leu Ala 

3075 3080 3085 

Ala Gly Ala Leu Asp Arg Leu Pro Val Ser Val Trp Glu Leu Ser Ala 

3090 ' 3095 3100 

Ala Pro Ala Ala Leu Thr His Met Ser Arg Gly Arg His Val Gly Lys 
3105 3110 3115 3120 

Leu Val Leu Thr Gin Pro Ala Pro Val His Pro Asp Gly Thr Val Leu 

3125 3130 3135 

Val Thr Gly Gly Thr Gly Thr Leu Gly Arg Leu Val Ala Arg His Leu 

3140 3145 3150 

Val Thr Gly His Gly Val Pro His Leu Leu Val Ala Ser Arg Arg Gly 

3155 3160 3165 

Pro Ala Ala Pro Gly Ala Ala Glu Leu Arg Ala Asp Val Glu Gly Leu 

3170 3175 3180 

Gly Ala Thr lie Glu He Val Ala Cys Asp Thr Ala Asp Arg Glu Ala 
3185 3190 3195 3200 

Leu Ala Ala Leu Leu Asp Ser He Pro Ala Asp Arg Pro Leu Thr Gly 

3205 3210 3215 

Val Val His Thr Ala Gly Val Leu Ala Asp Gly Leu Val Thr Ser He 

3220 3225 3230 

Asp Gly Thr Ala Thr Asp Gin Val Leu Arg Ala Lys Val Asp Ala Ala 

3235 3240 3245 

Trp His Leu His Asp Leu Thr Arg Asp Ala Asp Leu Ser Phe Phe Val 

3250 3255 3260 

Leu Phe Ser Ser Ala Ala Ser Val Leu Ala Gly Pro Gly Gin Gly Val 
3265 3270 3275 3280 

Tyr Ala Ala Ala Asn Gly Val Leu Asn Ala Leu Ala Gly Gin Arg Arg 

3285 3290 3295 

Ala Leu Gly Leu Pro Ala Lys Ala Leu Gly Trp Gly Leu Trp Ala Gin 

3300 3305 3310 

Ala Ser Glu Met Thr Ser Gly Leu Gly Asp Arg He Ala Arg Thr Gly 

3315 3320 3325 

Val Ala Ala Leu Pro* Thr Glu Arg Ala Leu Ala Leu Phe Asp Ala Ala 
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3330 3335 3340 

Leu Arg Ser Gly Gly Glu Val Leu. Phe Pro Leu Ser Val Asp Arg Ser 
3345 3350 3355 3360 

Ala Leu Arg Arg Ala Glu Tyr Val Pro Glu Val Leu Arg Gly Ala Val 

3365 3370 3375 

Arg Ser Thr Pro Arg Ala Ala Asn Arg Ala Glu Thr Pro Gly Arg Gly 

3380 3385 3390 

Leu Leu Asp Arg Leu Val Gly Ala Pro Glu Thr Asp Gin Val Ala Ala 

3395 3400 3405 

Leu Ala Glu Leu Val Arg Ser His Ala Ala Ala Val Ala Gly Tyr Asp 

3410 3415 3420 

Ser Ala Asp Gin Leu Pro Glu Arg Lys Ala Phe Lys Asp Leu Gly Phe 
3425 3430 3435 3440 

Asp Ser Leu Ala Ala Val Glu Leu Arg Asn Arg Leu Gly Val Thr Thr 

3445 3450 3455 

Gly Val Arg Leu Pro Ser Thr Leu Val Phe Asp His Pro Thr Pro Leu 

3460 3465 3470 

Ala Val Ala Glu His Leu Arg Ser Glu Leu Phe Ala Asp Ser Ala Pro 

3475 3480 3485 

Asp Val Gly Val Gly Ala Arg Leu Asp Asp Leu Glu Arg Ala Leu Asp 

3490 3495 3500 

Ala Leu Pro Asp Ala Gin Gly His Ala Asp Val Gly Ala Arg Leu Glu 
3505 3510 3515 3520 

Ala Leu Leu Arg Arg Trp Gin Ser Arg Arg Pro Pro Glu Thr Glu Pro 

3525 3530 3535 

Val Thr lie Ser Asp Asp Ala Ser Asp Asp Glu Leu Phe Ser Met Leu 

3540 3545 355.0 

Asp Arg Arg Leu Gly Gly Gly Gly Asp Val 
3555 3560 

<210> 15 
<211> 3201 " 
<212> PRT 

<213> Micromonospora megalomicea 
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195 200 205 

Ala Met Glu Ser Leu Arg Arg Asp Glu Cys Thr Leu Val Leu Ala Gly 

210 215 220 

Gly Val Thr Val Met Ser Ser Pro Gly Ala Phe Thr Glu Phe Arg Ser 
225 230 235 240 

Gin Gly Gly Leu Ala Glu Asp Gly Arg Cys Lys Pro Phe Ser Arg Ala 

245 250 255 

Ala Asp Gly Phe Gly Leu Ala Glu Gly Ala Gly Val Leu Val Leu Gin 

260 265 270 

Arg Leu Ser Val Ala Arg Ala Glu Gly Arg Pro Val Leu Ala Val Leu 

275 280 285 

Arg Gly Ser Ala lie Asn Gin Asp Gly Ala Ser Asn Gly Leu Thr Ala 

290 295 300 

Pro Ser Gly Pro Ala Gin Arg Arg Val lie Arg Gin Ala Leu Glu Arg 
305 310 315 320 

Ala Arg Leu Arg Pro Val Asp Val Asp Tyr Val Glu Ala His Gly Thr 

325 330 335 

Gly Thr Arg Leu Gly Asp Pro lie Glu Ala His Ala Leu Leu Asp Thr 

340 345 350 

Tyr Gly Ala Asp Arg Glu Pro Gly Arg Pro Leu Trp Val Gly Ser Val 

355 360 365 

Lys Ser Asn lie Gly His Thr Gin Ala Ala Ala Gly Val Ala Gly Val 

370 375 380 

Met Lys Thr Val Leu Ala Leu Arg His Arg Glu He Pro Ala Thr Leu 
385 390 395 400 

His Phe Asp Glu Pro Ser Pro His Val Asp Trp Asp Arg Gly Ala Val 

405 410 415 

Ser Val Val Ser Glu Thr Arg Pro Trp Pro Val Gly Glu Arg Pro Arg 

420 425 430 

Arg Ala Gly Val Ser Ser Phe Gly He Ser Gly Thr Asn Ala His Val 

435 440 * 445 

He Val Glu" Glu Ala Pro Ser Pro Gin Ala Ala Asp Leu Asp Pro Thr 

450 455 460 

Pro Gly Pro Ala Thr Gly Ala Thr Pro Gly Thr Asp Ala Ala Pro Thr 
465 470 475 480 

Ala Glu Pro Gly Ala Glu Ala Val Ala Leu Val Phe Ser Ala Arg Asp 

485 490 495 

Glu Arg Ala Leu Arg Ala Gin Ala Ala Arg Leu Ala Asp Arg Leu Thr 

500 505 510 

Asp Asp Pro Ala Pro Ser Leu Arg Asp Thr Ala Phe Thr Leu Val Thr 

515 520 525 

Arg Arg Ala Thr Trp Glu His Arg Ala Val Val Val Gly Gly Gly Glu 

530 535 540 

Glu Val Leu Ala Gly Leu Arg Ala Val Ala Gly Gly Arg Pro Val Asp 
545 550 555 560 

Gly Ala Val Ser Gly Arg Ala Arg Ala Gly Arg Arg Val Val Leu Val 

565 570 575 

Phe Pro Gly Gin Gly Ala Gin Trp Gin Gly Met Ala Arg Asp Leu Leu 

580 585 590 

Arg Gin Ser Pro Thr Phe Ala Glu Ser He Asp Ala Cys Glu Arg Ala 

595 600 605 

Leu Ala Pro His Val Asp Trp Ser Leu Arg Glu Val Leu Asp Gly Glu 

610 615 620 

Gin Ser Leu Asp Pro Val Asp Val Val Gin Pro Val Leu Phe Ala Val 
625 630 635 640 

Met Val Ser Leu Ala Arg Leu Trp Gin Ser Tyr Gly Val Thr Pro Gly 

645 650 655 

Ala Val Val Gly His Ser Gin Gly Glu He Ala Ala Ala His Val Ala 

660 665 670 

Gly Ala Leu Ser Leu Ala Asp Ala Ala Arg Val Val Ala Leu Arg Ser 
675 680 685 
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Arg Val Leu Arg Arg Leu Gly Gly His Gly Gly Met Ala Ser Phe Gly 

690 695 700 

Leu His Pro Asp Gin Ala Ala Glu Arg He Ala Arg Phe Ala Gly Ala 
705 710 715 720 

Leu Thr Val Ala Ser Val Asn Gly Pro Arg Ser Val Val Leu Ala Gly 

"725 730 735 

Glu Asn Gly Pro Leu Asp Glu Leu He Ala Glu Cys Glu Ala Glu Gly 

740 745 750 

Val Thr Ala Arg Arg He Pro Val Asp Tyr Ala Ser His Ser Pro Gin 

755 - 760 765 

Val Glu Ser Leu Arg Glu Glu Leu Leu. Ala Ala Leu Ala Gly Val Arg 

770 775 780 

Pro Val Ser Ala Gly He Pro Leu Tyr Ser Thr Leu Thr Gly Gin Val 
785 790 795 800 

He Glu Thr Ala Thr Met Asp Ala Asp Tyr Trp Phe Ala Ash Leu Arg 

805 810 815 

Glu Pro Val Arg Phe Gin Asp Ala Thr Arg Gin Leu Ala Glu Ala Gly 

820 825 830 

Phe Asp Ala Phe Val Glu Val Ser Pro His Pro Val Leu Thr Val Gly 

835 840 845 

Val Glu Ala Thr Leu Glu Ala Val Leu Pro Pro Asp Ala Asp Pro Cys 

850 855 860 

Val Thr Gly Thr Leu Arg Arg Glu Arg Gly Gly Leu Ala Gin Phe His 
865 870 875 880 

Thr Ala Leu Ala Glu Ala Tyr Thr Arg Gly Val Glu Val Asp Trp Arg 

885 890 895 

Thr Ala Val Gly Glu Gly Arg Pro Val Asp Leu Pro Val Tyr Pro Phe 

900 905 910 

Gin Arg Gin Asn Phe Trp Leu Pro Val Pro Leu Gly Arg Val Pro Asp 

915 920 925 

Thr Gly Asp Glu Trp Arg Tyr Gin Leu Ala Trp His Pro Val Asp Leu 

930 ' 935 940 

Gly Arg Ser Ser Leu Ala Gly Arg Val Leu Val Val Thr Gly Ala Ala 
945 950 955 960 

Val Pro Pro Ala Trp Thr Asp Val Val Arg Asp Gly Leu Glu Gin Arg 

965 970 975 

Gly Ala Thr Val Val Leu Cys Thr Ala Gin Ser Arg Ala Arg He Gly 

980 985 990 

Ala Ala Leu Asp Ala Val Asp Gly Thr Ala Leu Ser Thr Val Val Ser 

995 1000 1005 

Leu Leu Ala Leu Ala Glu Gly Gly Ala Val Asp Asp Pro Ser Leu Asp 

1010 1015 1020 

Thr Leu Ala Leu Val Gin Ala Leu Gly Ala Ala Gly He Asp Val Pro 
1025 1030 1035 1040 

Leu Trp Leu Val Thr Arg Asp Ala Ala Ala Val Thr Val Gly Asp Asp 

1045 1050 1055 

Val Asp Pro Ala Gin Ala Met Val Gly Gly Leu Gly Arg Val Val Gly 

1060 1065 1070 

Val Glu Ser Pro Ala Arg Trp Gly Gly Leu Val Asp Leu Arg Glu Ala 

1075 1080 1085 

Asp Ala Asp Ser Ala Arg Ser Leu Ala Ala He Leu Ala Asp Pro Arg 

1090 1095 1100 

Gly Glu Glu Gin Phe Ala He Arg Pro Asp Gly Val Thr Val Ala Arg 
1105 1110 H15 H20 

Leu Val Pro Ala Pro Ala Arg Ala Ala Gly Thr Arg Trp Thr Pro Arg 

1125 1130 1135 

Gly Thr Val Leu Val Thr Gly Gly Thr Gly Gly He Gly Ala His Leu 

1140 1145 1150 

Ala Arg Trp Leu Ala Gly Ala Gly Ala Glu His Leu Val Leu Leu Asn 

1155 1160 H65 

Arg Arg Gly Ala Glu Ala Ala Gly Ala Ala Asp Leu Arg Asp Glu Leu 
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1170 ll 7 ^ 1180 

Val Ala Leu Gly Thr Gly Val Thr lie Thr Ala Cys Asp Val Ala Asp 
1185 1190 1195 1200 

Arg Asp Arg Leu Ala Ala Val Leu Asp Ala Ala Arg Ala Gin Gly Arg 

1205 1210 1215 

Val Val Thr Ala Val Phe His Ala Ala Gly lie Ser Arg Ser Thr Ala 

1220 1225 1230 

Val Gin Glu Leu Thr Glu Ser Glu Phe Thr Glu lie Thr Asp Ala Lys 

1235 1240 1245 

Val Arg Gly Thr Ala Asn Leu Ala Glu Leu Cys Pro Glu Leu Asp Ala 

1250 1255 1260 

Leu Val Leu Phe Ser Ser Asn Ala Ala Val Trp Gly Ser Pro Gly Leu 
1265 1270 1275 1280 

Ala Ser Tyr Ala Ala Gly Asn Ala Phe Leu Asp Ala Phe Ala Arg Arg 

1285 1290 1295 

Gly Arg Arg Ser Gly Leu Pro Val Thr Ser He Ala Trp Gly Leu Trp 

1300 1305 1310 

Ala Gly Gin Asn Met Ala Gly Thr Glu Gly Gly Asp Tyr Leu Arg Ser 

1315 1320 1325 

Gin Gly Leu Arg Ala Met Asp Pro Gin Arg Ala He Glu Glu Leu Arg 

1330 1335 1340 

Thr Thr Leu Asp Ala Gly Asp Pro Trp Val Ser Val Val Asp Leu Asp 
1345 1350 1355 1360 

Arg Glu Arg Phe Val Glu Leu Phe Thr Ala Ala Arg Arg Arg Pro Leu 

1365 1370 1375 

Phe Asp Glu Leu Gly Gly Val Arg Ala Gly Ala Glu Glu Thr Gly Gin 

1380 1385 1390 

Glu Ser Asp Leu Ala Arg Arg Leu Ala Ser Met Pro Glu Ala Glu Arg 

1395 1400 1405 

His Glu His Val Ala Arg Leu Val Arg Ala Glu Val Ala Ala Val Leu 

1410 1415 1420 

Gly His Gly Thr Pro Thr Val He Glu Arg Asp Val Ala Phe Arg Asp 
1425 1430 1435 1440 

Leu Gly Phe Asp Ser Met Thr Ala Val Asp Leu Arg Asn Arg Leu Ala 

1445 1450 1455 

Ala Val Thr Gly Val Arg Val Ala Thr Thr He Val Phe Asp His Pro 

1460 1465 1470 

Thr Val Asp Arg Leu Thr Ala His Tyr Leu Glu Arg Leu Val Gly Glu 

1475 1480 1485 

Pro Glu Ala Thr Thr Pro Ala Ala Ala Val Val Pro Gin Ala Pro Gly 

1490 1495 1500 

Glu Ala Asp Glu Pro He Ala He Val Gly Met Ala Cys Arg Leu Ala 
1505 1510 1515 1520 

Gly Gly Val Arg Thr Pro Asp Gin Leu Trp Asp Phe He Val Ala Asp 

1525 1530 1535 

Gly Asp Ala Val Thr Glu Met Pro Ser Asp Arg Ser Trp Asp Leu Asp 

1540 1545 1550 

Ala Leu Phe Asp Pro Asp Pro Glu Arg His Gly Thr Ser Tyr Ser Arg 

1555 1560 1565 

His Gly Ala Phe Leu Asp Gly Ala Ala Asp Phe Asp Ala Ala Phe Phe 

1570 1575 1580 

Gly He Ser Pro Arg Glu Ala Leu Ala Met Asp Pro Gin Gin Arg Gin 
1585 1590 1595 1600 

Val Leu Glu Thr Thr Trp Glu Leu Phe Glu Asn Ala Gly He Asp Pro 

1605 1610 1615 

His Ser Leu Arg Gly Thr Asp Thr Gly Val. Phe Leu Gly Ala Ala Tyr 

1620 1625 1630 

Gin Gly Tyr Gly Gin Asn Ala Gin Val Pro Lys Glu Ser Glu Gly Tyr 

1635 1640 1645 

Leu Leu Thr Gly Gly Ser Ser Ala Val Ala Ser Gly Arg He Ala Tyr 
1650 1655 1660 
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Val Leu Gly Leu Glu Gly Pro Ala lie Thr Val Asp Thr Ala Cys Ser 
1665 1670 1675 1680 

Ser Ser Leu Val Ala Leu His Val Ala Ala Gly Ser Leu Arg Ser Gly 

1685 1690 1695 

Asp Cys Gly Leu Ala Val Ala Gly Gly Val Ser Val Met Ala Gly Pro 

1700 1705 1710 

Glu Val Phe Thr Glu Phe Ser Arg Gin Gly Ala Leu Ala Pro Asp Gly 

1715 1720 1725 

Arg Cys Lys Pro Phe Ser Asp Gin Ala Asp Gly Phe Gly Phe Ala Glu 

1730 1735 1740 

Gly Val Ala Val Val Leu Leu Gin Arg Leu Ser Val Ala Val Arg Glu 
1745 1750 1755 1760 

Gly Arg Arg Val Leu Gly Val Val Val Gly Ser Ala Val Asn Gin Asp 

1765 1770 1775 

Gly Ala Ser Asn Gly Leu Ala Ala Pro Ser Gly Val Ala Gin Gin Arg 

1780 1785 1790 

Val lie Arg Arg Ala Trp Gly Arg Ala Gly Val Ser Gly Gly Asp Val 

1795 1800 1805 

Gly Val Val Glu Ala His Gly Thr Gly Thr Arg Leu Gly Asp Pro Val 

1810 1815 1820 

Glu Leu Gly Ala Leu Leu Gly Thr Tyr Gly Val Gly Arg Gly Gly Val 
1825 1830 1835 * 1840 

Gly Pro Val Val Val Gly Ser Val Lys Ala Asn Val Gly His Val Gin 

1845 1850 1855 

Ala Ala Ala Gly Val Val Gly Val lie Lys Val Val Leu Gly Leu Gly 

1860 1865 1870 

Arg Gly Leu Val Gly Pro Met Val Cys Arg Gly Gly Leu Ser Gly Leu 

1875 1880 1885 

Val Asp Trp Ser Ser Gly Gly Leu Val Val Ala Asp Gly Val Arg Gly 

1890 1895 1900 

Trp Pro Val Gly Val Asp Gly Val Arg Arg Gly Gly Val Ser Ala Phe 
1905 * 1910 1915 1920 

Gly Val Ser Gly Thr Asn Ala His Val Val Val Ala Glu Ala Pro Gly 

1925 1930 1935 

Ser Val Val Gly Ala Glu Arg Pro Val Glu Gly Ser Ser Arg Gly Leu 

1940 1945 1950 

Val Gly Val Ala Gly Gly Val Val Pro Val Val Leu Ser Ala Lys Thr 

1955 1960 1965 

Glu Thr Ala Leu Thr Glu Leu Ala Arg Arg Leu His Asp Ala Val Asp 

1970 1975 1980 

Asp Thr Val Ala Leu Pro Ala Val Ala Ala Thr Leu Ala Thr Gly Arg 
1985 1990 1995 2000 

Ala His Leu Pro Tyr Arg Ala Ala Leu Leu Ala Arg Asp His Asp Glu 

2005 2010 ' 2015 

Leu Arg Asp Arg Leu Arg Ala Phe Thr Thr Gly Ser Ala Ala Pro Gly 

2020 2025 2030 

Val Val Ser Gly Val Ala Ser Gly Gly Gly Val Val Phe Val Phe Pro 

2035 2040 2045 

Gly Gin Gly Gly Gin Trp Val Gly Met Ala Arg Gly Leu Leu Ser Val 

2050 2055 2060 

Pro Val Phe Val Glu Ser Val Val Glu Cys Asp Ala Val Val Ser Ser 
2065 2070 2075 2080 

Val Val Gly Phe Ser Val Leu Gly Val Leu Glu Gly Arg Ser Gly Ala 

2085 2090 2095 

Pro Ser Leu Asp Arg Val Asp Val Val Gin Pro Val Leu Phe Val Val 

2100 2105 2110 

Met Val Ser Leu Ala Arg Leu Trp Arg Trp Cys Gly Val Val Pro Ala 

2115 2120 2125 

Ala Val Val Gly His Ser Gin Gly Glu He Ala Ala Ala Val Val Ala 

2130 2135 2140 

Gly Val Leu Ser Val Gly Asp Gly Ala Arg Val Val Ala Leu Arg Ala 
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2145 2150 2155 . 2160 

Arg Ala Leu Arg Ala Leu Ala Gly His Gly Gly Met Val Ser Leu Ala 

2165 2170 2175 

Val Ser Ala Glu Arg Ala Arg Glu Leu lie Ala Pro Trp Ser Asp Arg 

2180 2185 2190 

lie Ser Val Ala Ala Val Asn Ser Pro Thr Ser Val Val Val Ser Gly 

2195 2200 2205 

Asp Pro Gin Ala Leu Ala Ala Leu Val Ala His Cys Ala Glu Thr Gly 

2210 2215 2220 

Glu Arg Ala Lys Thr Leu Pro Val Asp Tyr Ala Ser His Ser Ala His 
2225 2230 2235 2240 

Val Glu Gin lie Arg Asp Thr lie Leu Thr Asp Leu Ala Asp Val Thr 

2245 2250 2255 

Ala Arg Arg Pro Asp Val Ala Leu Tyr Ser Thr Leu His Gly Ala Arg 

2260 2265 2270 

Gly Ala Gly Thr Asp Met Asp Ala Arg Tyr Trp Tyr Asp Asn Leu Arg 

2275 2280 2285 

Ser Pro Val Arg Phe Asp Glu Ala Val Glu Ala Ala Val Ala Asp Gly 

2290 2295 2300 

Tyr Arg Val Phe Val Glu Met Ser Pro His Pro Val Leu Thr Ala Ala 
2305 2310 2315 2320 

Val Gin Glu lie Asp Asp Glu Thr Val Ala lie Gly Ser -Leu His Arg 

2325 2330 2335 

Asp Thr Gly Glu Arg His Leu Val Ala Glu Leu Ala Arg Ala His Val 

2340 2345 2350 

His Gly Val Pro Val Asp Trp Arg Ala lie Leu Pro Ala Thr His Pro 

2355 2360 2365 

Val Pro Leu Pro Asn Tyr Pro Phe Glu Ala Thr Arg Tyr Trp Leu Ala 

2370 2375 2380 

Pro Thr Ala Ala Asp Gin Val Ala Asp His Arg Tyr Arg Val Asp Trp 
2385 2390 2395 2400 

Arg Pro Leu' Ala Thr Thr Pro Ala Glu Leu Ser Gly Ser Tyr Leu Val 

2405 2410 2415 

Phe Gly Asp Ala Pro Glu Thr Leu Gly His Ser Val Glu Lys Ala Gly 

2420 2425 2430 

Gly Leu Leu Val Pro Val Ala Ala Pro Asp Arg Glu Ser Leu Ala Val 

2435 2440 2445 

Ala Leu Asp Glu Ala Ala Gly Arg Leu Ala Gly Val Leu Ser Phe Ala 

2450 2455 2460 

Ala Asp Thr Ala Thr His Leu Ala Arg His Arg Leu Leu Gly Glu Ala 
2465 2470 2475 2480 

Asp Val Glu Ala Pro Leu Trp Leu Val Thr Ser Gly Gly Val Ala Leu 

2485 2490 2495 

Asp Asp His Asp Pro lie Asp Cys Asp Gin Ala Met Val Trp Gly lie 

2500 2505 2510 

Gly Arg Val Met Gly Leu Glu Thr Pro His Arg Trp Gly Gly Leu Val 

2515 2520 2525 

Asp Val Thr Val Glu Pro Thr Ala Glu Asp Gly Val Val Phe Ala Ala 

2530 2535 2540 

Leu Leu Ala Ala Asp Asp His Glu Asp Gin Val Ala Leu Arg Asp Gly 
2545 2550 2555 2560 

lie Arg His Gly Arg Arg Leu Val Arg Ala Pro Leu Thr Thr Arg Asn 

2565 2570 ' 2575 

Ala Arg Trp Thr Pro Ala Gly Thr Ala Leu Val Thr Gly Gly Thr Gly 

2580 2585 2590 

Ala Leu Gly Gly His Val Ala Arg Tyr Leu Ala Arg Ser Gly Val Thr 

2595 2600 2605 

Asp Leu Val Leu Leu Ser Arg Ser Gly Pro Asp Ala Pro Gly Ala Ala 

2610 2615 2620 

Glu Leu Ala Ala Glu Leu Ala Asp Leu Gly Ala Glu Pro Arg Val Glu 
2625 2630 2635 2640 
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Ala Cys Asp Val Thr Asp Gly Pro Arg Leu Arg Ala Leu Val Gin Glu 

2645 2650 * 2655 

Leu Arg Glu Gin Asp Arg Pro Val Arg lie Val Val His Thr Ala Gly 

2660 2665 2670 

Val Pro Asp Ser Arg Pro Leu Asp Arg lie. Asp Glu Leu Glu Ser Val 

2675 2680 2685 

Ser Ala Ala Lys Val Thr Gly Ala Arg Leu Leu Asp Glu Leu Cys Pro 

2690 2695 2700 

Asp Ala Asp Thr Phe Val Leu Phe Ser Ser Gly Ala Gly Val Trp Gly 
2705 2710 2715 2720 

Ser Ala Asn Leu Gly Ala Tyr Aia Ala Ala Asn Ala Tyr Leu Asp Ala 

2725 2730 2735 

Leu Ala His Arg Arg Arg Gin Ala Gly Arg Ala Ala Thr Ser Val Ala 

2740 2745 2750 

Trp Gly Ala Trp Ala Gly Asp Gly Met Ala Thr Gly Asp Leu Asp Gly 

2755 2760 2765 

Leu Thr Arg Arg Gly Leu Arg Ala Met Ala Pro Asp Arg Ala Leu Arg 

2770 2775 2780 

Ala Cys Thr Arg Arg Trp Thr Thr His Asp Thr Cys Val Ser Val Ala 
2785 2790 2795 2800 

Asp Val Asp Trp Asp Arg Phe Ala Val Gly Phe Thr Ala Al.a Arg Pro 

2805 2810 2815 

Arg Pro Leu lie Asp Glu Leu Val Thr Ser Ala Pro Val Ala Ala Pro 

2820 2825 2830 

Thr Ala Ala Ala Ala Pro Val Pro Ala Met Thr Ala Asp Gin Leu Leu 

2835 2840 2845 

Gin Phe Thr Arg Ser His Val Ala Ala He Leu Gly His Gin Asp Pro 

2850 2855 2860 

Asp Ala Val Gly Leu Asp Gin Pro Phe Thr Glu Leu Gly Phe Asp Ser 
2865 2870 2875 2880 

Leu Thr Ala Val Gly Leu Arg Asn Gin Leu Gin Gin Ala Thr Gly Arg 

2885 2890 2895 

Thr Leu Pro Ala Ala Leu Val Phe Gin His Pro Thr Val Arg Arg Leu 

2900 2905 2910 

Ala Asp His Leu Ala Gin Gin Leu Asp Val Gly Thr Ala Pro Val Glu 

2915 2920 2925 

Ala Thr Gly Ser Val Leu Arg Asp Gly Tyr Arg Arg Ala Gly Gin Thr 

2930 2935 2940 

Gly Asp Val Arg Ser Tyr Leu Asp Leu Leu Ala Asn Leu Ser Glu Phe 
2945 2950 2955 2960 

Arg Glu Arg Phe Thr Asp Ala Ala Ser Leu Gly Gly Gin Leu Glu Leu 

2965 2970 2975 

Val Asp Leu Ala Asp Gly Ser Gly Pro Val Thr Val He Cys Cys Ala 

2980 2985 2990 

Gly Thr Ala Ala Leu Ser Gly Pro His Glu Phe Ala Arg Leu Ala Ser 

2995 3000 3005 

Ala Leu Arg Gly Thr Val Pro Val Arg Ala Leu Ala Gin Pro Gly Tyr 

3010 3015 3020 

Glu Ala Gly Glu Pro Val Pro Ala Ser Met Glu Ala Val Leu Gly Val 
3025 3030 3035 3040 

Gin Ala Asp Ala Val Leu Ala Ala Gin Gly Asp Thr Pro Phe Val Leu 

3045 3050 3055 

Val Gly His Ser Ala Gly Ala Leu Met Ala Tyr Ala Leu Ala Thr Glu 

3060 3065 3070 

Leu Ala Asp Arg Gly His Pro Pro Arg Gly Val Val Leu Leu Asp Val 

3075 3080 3085 

Tyr Pro Pro Gly His Gin Glu Ala Val His Ala Trp Leu Gly Glu Leu 

3090 3095 3100 

Thr Ala Ala Leu Phe Asp His Glu Thr Val Arg Met Asp Asp Thr Arg 
3105 3110 3115 3120 

Leu Thr Ala Leu Gly Ala Tyr Asp Arg Leu Thr Gly Arg Trp Arg Pro 
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3125 3130 3135 

Arg Asp Thr Gly Leu Pro Thr Leu Val Val Ala Ala Ser Glu Pro Met 

3140 3145 3150 

Gly Glu Trp Pro Asp Asp Gly Trp Gin Ser Thr Trp Pro Phe Gly His 

3155 3160 3165 

Asp Arg Val Thr Val Pro Gly Asp His Phe Ser Met Val Gin Glu His 

3170 3175 3180 

Ala Asp Ala lie Ala Arg His He Asp Ala Trp Leu. Ser Gly Glu Arg 
3185 3190 3195 3200 

Ala 



<210> 16 
<211> 358 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 16 

Met Asn Thr Thr Asp Arg Ala Val Leu Gly Arg Arg Leu Gin Met He 

1 5 10. 15 

Arg Gly Leu Tyr Trp Gly Tyr Gly Ser Asn Gly Asp Pro Tyr Pro Met 

20 25 "30 

Leu Leu Cys Gly His Asp Asp Asp Pro His Arg Trp Tyr Arg Gly Leu 

35 40 45 

Gly Gly Ser Gly Val Arg Arg Ser Arg Thr Glu Thr Trp Val Val Thr 

50 55 60 

Asp His Ala Thr Ala Val Arg Val Leu Asp Asp Pro Thr Phe Thr Arg 
65 70 75 80 

Ala Thr Gly Arg Thr Pro Glu Trp Met Arg Ala Ala Gly Ala Pro Ala 

85 90 95 

Ser Thr Trp Ala Gin Pro Phe Arg Asp Val His Ala Ala Ser Trp Asp 

- 100 105 HO 

Ala Glu Leu Pro Asp Pro Gin Glu Val Glu Asp Arg Leu Thr Gly Leu 

115 120 125 

Leu Pro Ala Pro Gly Thr Arg Leu Asp Leu Val Arg Asp Leu Ala Trp 

130 135 140 

Pro Met Ala Ser Arg Gly Val Gly Ala Asp Asp Pro Asp Val Leu Arg 
145 150 155 160 

Ala Ala Trp Asp Ala Arg Val Gly Leu Asp Ala Gin Leu Thr Pro Gin 

165 170 175 

Pro Leu Ala Val Thr Glu Ala Ala He Ala Ala Val Pro Gly Asp Pro 

180 185 190 

His Arg Arg Ala Leu Phe Thr Ala Val Glu Met Thr Ala Thr Ala Phe 

195 200 205 

Val Asp Ala Val Leu Ala Val Thr Ala Thr Ala Gly Ala Ala Gin Arg 

210 215 220 

Leu Ala Asp Asp Pro Asp Val Ala Ala Arg Leu Val Ala Glu Val Leu 
225 230 235 240 

Arg Leu His Pro Thr Ala His Leu Glu Arg Arg Thr Ala Gly Thr Glu 

245 250 255 

Thr Val Val Gly Glu His Thr Val Ala Ala Gly Asp Glu Val Val Val 

260 265 270 

Val Val Ala Ala Ala Asn Arg Asp Ala Gly Val Phe Ala Asp Pro Asp 

275 280 285 

Arg Leu Asp Pro Asp Arg Ala Asp Ala Asp Arg Ala Leu Ser Ala Gin 

290 295 300 

Arg Gly His Pro Gly Arg Leu Glu Glu Leu Val Val Val Leu Thr Thr 
305 310 315 320 

Ala Ala Leu Arg Ser Val Ala Lys Ala Leu Pro Gly Leu Thr Ala Gly 

325 330 335 

Gly Pro Val Val Arg Arg Arg Arg Ser Pro Val Leu Arg Ala Thr Ala 
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340 

His Cys Pro Val Glu Leu 
355 



345 



350 



<210> 17 
<211> 422 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 17 



Met 


Arg 


Val 


Val 


Phe 


Ser 


Ser 


Met 


Ala 


Ser 


Lys 


Ser 


His 


Leu 


Phe 


Gl v 


1 








5 










10 










15 


Leu 


Val 


Pro 


Leu 


Ala 


Trp 


Ala 


Phe 


Arg 


Ala 


Ala 


Gly 


His 


Glu 

VJ 


Val 


Arg 








20 










25 










30 




Val 


Val 


Ala 


Ser 


Pro 


Ala 


Leu 


Thr 


Asp 


Asp 


He 


Thr 


Ala 


Ala 


Gl v 


Leu 






35 . 










40 










45 






Thr 


Ala 
50 


Val 


Pro 


Val 


Gly 


Thr 
55 


Asp 


Val 


Asp 


Leu 


Val 
60 


Asd 


Phe 


Met 


Thr 


His 


Ala 


Gly 


Tyr 


Asp 


He 


lie 


Asp 


Tyr 


Val 


Ara 


Ser 


Leu 


Asn 


Phe 




65 










70 










75 








ft n 

o u 


Glu 


Arg 


Asp 


Pro 


Ala 


Thr 


Ser 


Thr 


Trp 


ASD 


His 


Leu 


Leu 


Gl v 


Met* 


Gl n 










85 










90 








95 




Thr 


Val 


Leu 


Thr 
100 


Pro 


Thr 


Phe 


Tyr 


Ala 
105 


Leu 


Met 


Ser 


Pro 


ASD 

110 

J— -JL \J 


Ser 


Leu 


Val 


Glu 


Gly 


Met 


He 


Ser 


Phe 


Cys 


Arg 


Ser 


Trp 


Arq 


Pro 


Asp 


Trp 


Ser 






115 










120 










125 




Ser 


Gly 


Pro 


Gin 


Thr 


Phe 


Ala 


Ala 


Ser 


He 


Ala 


Ala 


Thr 


Val 


Thr 


Gly 




130 










135 










140 








Val 


Ala 


His 


Ala 


Arg 


Leu 


Leu 


Trp 


Gly 


Pro 


Asp 


He 


Thr 


Val 


Arg 


Ala 


145 










150 










155 








160 


Arg 


Gin 


Lys 


Phe 


Leu 


Gly 


Leu 


Leu 


Pro 


Gly 


Gin 


Pro 


Ala 


Ala 


His 


Ara 










165 










170 










175 


Glu 


Asp 


Pro 


Leu 


Ala 


Glu 


Trp 


Leu 


Thr 


Trp 


Ser 


Val 


Glu 


Arg 


Phe 


Gly 








180 










185 










190 




Gly 


Arg 


Val 


Pro 


Gin 


Asp 


Val 


Glu 


Glu 


Leu 


Val 


Val 


Gly 


Gin 


Trp 


Thr 






195 










200 










205 






He 


Asp 


Pro 


Ala 


Pro 


Val 


Gly 


Met 


Arg 


Leu 


Asp 


Thr 


Gly 


Leu 


Arg 


Thr 




210 










215 










220 








Val 


Gly 


Met 


Arg 


Tyr 


Val 


Asp 


Tyr 


Asn 


Gly 


Pro 


Ser 


Val 


Val 


Pro 


Asp 


225 










230 










235 










240 


Trp 


Leu 


His 


Asp 


Glu 


Pro 


Thr 


Arg 


Arg 


Arg 


Val 


Cys 


Leu 


Thr 


Leu 


Gly 










245 










250 










255 


He 


Ser 


Ser 


Arg 


Glu 


Asn 


Ser 


He 


Gly 


Gin 


Val 


Ser 


Val 


Asp 


Asp 


Leu 








2 60 










265 










270 




Leu 


Gly 


Ala 


Leu 


Gly 


Asp 


Val 


Asp 


Ala 


Glu 


He 


He 


Ala 


Thr 


Val 


Asp 






275 










280 










285 






Glu 


Gin 


Gin 


Leu 


Glu 


Gly 


Val 


Ala 


His 


Val 


Pro 


Ala 


Asn 


He 


Arg 


Thr 




290 










295 










300 








Val 


Gly 


Phe 


Val 


Pro 


Met 


His 


Ala 


Leu 


Leu 


Pro 


Thr 


Cys 


Ala 


Ala 


Thr 


305 










310 










315 








320 


Val 


His 


His 


Gly 


Gly 


Pro 


Gly 


Ser 


Trp 


His 


Thr 


Ala 


Ala 


He 


His 


Gly 










325 










330 










335 


Val 


Pro 


Gin 


Val 


He 


Leu 


Pro 


Asp 


Gly 


Trp 


Asp 


Thr 


Gly 


Val 


Arg 


Ala 








340 










345 










350 




Gin 


Arg 


Thr 
355 


Glu 


Asp 


Gin 


Gly 


Ala 
360 


Gly 


He 


Ala 


Leu 


Pro 
365 


Val 


Pro 


Glu 


Leu 


Thr 


Ser 


Asp 


Gin 


Leu 


Arg 


Glu 


Ala 


Val 


Arg 


Arg 


Val 


Leu 


Asp 


Asp 




370 










375 










380 






Pro 


Ala 


Phe 


Thr 


Ala 


Gly 


Ala 


Ala 


Arg 


Met 


Arg 


Ala 


Asp 


Met 


Leu 


Ala 


385 










390 










395 










400 


Glu 


Pro 


Ser 


Pro 


Ala 


Glu 


Val 


Val 


Asp 


Val 


Cys 


Ala 


Gly 


Leu 


Val 


Gly 
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405 410 415 

Glu Arg Thr Ala Val Gly 

420 

<210> 18 
<211> 323 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 18 

Met Ser Thr Asp Ala Thr His Val Arg Leu Gly Arg Cys Ala Leu Leu 

1 5 10 15 

Thr Ser Arg Leu Trp Leu Gly Thr Ala Ala Leu Ala Gly Gin Asp Asp 

20 25 30 

Ala Asp Ala Val Arg Leu Leu Asp His Ala Arg Ser Arg Gly Val Asn 

35 40 45 

Cys Leu Asp Thr Ala Asp Asp Asp Ser Ala.Ser Thr Ser Ala Gin Val 

50 55 60 

Ala Glu Glu Ser Val Gly Arg Trp Leu Ala Gly Asp Thr Gly Arg Arg 
65 70 75 80 

Glu Glu Thr Val Leu Ser Val Thr Val Gly Val Pro Pro Gly Gly Gin 

85 90 ' 95 

Val Gly Gly Gly Gly Leu Ser Ala Arg Gin lie lie Ala Ser Cys Glu 

100 105 110 

Gly Ser Leu Arg Arg Leu Gly Val Asp His Val Asp Val Leu His Leu 

115 120 125 

Pro Arg Val Asp Arg Val Glu Pro Trp Asp Glu Val Trp Gin Ala Val 

130 135 140 

Asp Ala Leu Val Ala Ala Gly Lys Val Cys Tyr Val Gly Ser Ser Gly 
145 150 155 160 

Phe Pro Gly Trp His He Val Ala Ala Gin Glu His Ala Val Arg Arg 

165 170 175 

His Arg Leu Gly Leu Val Ser His Gin Cys Arg Tyr Asp Leu Thr Ser 

180 185 190 

Arg His Pro Glu Leu Glu Val Leu Pro Ala Ala Gin Ala Tyr Gly Leu 

195 200 205 

Gly Val Phe Ala Arg Pro Thr Arg Leu Gly Gly Leu Leu Gly Gly Asp 

210 215 220 

Gly Pro Gly Ala Ala Ala Ala Arg Ala Ser Gly Glh Pro Thr Ala Leu 
225 230 235 240 

Arg Ser Ala Val Glu Ala Tyr Glu Val Phe Cys Arg Asp Leu Gly Glu 

245 250 255 

His Pro Ala Glu Val Ala Leu Ala Trp Val Leu Ser Arg Pro Gly Val 

260 265 270 

Ala Gly Ala Val Val Gly Ala Arg Thr Pro Gly Arg Leu Asp Ser Ala 

275 280 285 

Leu Arg Ala Cys Gly Val Ala Leu Gly Ala Thr Glu Leu Thr Ala Leu 

290 295 300 

Asp Gly He Phe Pro Gly Val Ala Ala Ala Gly Ala Ala Pro Glu Ala 
305 310 315 320 

Trp Leu Arg 



<210> 19 
<211> 247 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 19 

Met Asn Thr Trp Leu Arg Arg Phe Gly Ser Ala Asp Gly His Arg Ala 
15 10 15 
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Arg 


Leu 


Tyr 


Cys 


Phe 


Pro 


His 


Ala 


Gly 


Ala 


Ala 


Ala 


Asp 


Ser 


Tyr 


Leu 








20 










25 










30 




Asp 


Leu 


Ala 
35 


Arg 


Ala 


Leu 


Ala 


Pro 
40 


Glu 


Val 


Asp 


Val 


Trp 
4 5 


Ala 


Val 


Gin 


Tyr 


Pro 


Gly 


Arg 


Gin 


Asp 


Arg 


Arg 


Asp 


Glu 


Arg 


Ala 


Leu 


Gly 


Thr 


Ala 


• 


50 










55 










60 








Gly 


Glu 


lie 


Ala 


Asp 


Glu 


Val 


Ala 


Ala 


Val 


Leu 


Arg 


Asp 


Leu 


Val 


Gly 


65 










70 










75 










80 


Glu 


Val 


Pro 


Phe 


Ala 
85 


Leu 


Phe 


Gly 


His 


Ser 
90 


Met 


Gly 


Ala 


Leu 


Val 
95 


Ala 


Tyr 


Glu 


Thr 


Ala 
100 


Arg 


Arg 


Leu 


Glu 


Ala 
105 


Arg 


Pro 


Gly 


Val 


Arg 
110 


Pro 


Leu 


Arg 


Leu 


Phe 


Val 


Ser 


Gly 


Gin 


Thr 


Ala 


Pro 


Arg 


Val 


His 


Glu 


Arg 


Arg 






115 










120 










125 






Thr 


Asp 


Leu 


Pro 


Asp 


Glu 


Asp 


Gly 


Leu 


Val 


Glu 


Gin 


Met 


Arq 


Arg 


Leu 




130 










135 










140 






Gly 


Val 


Ser 


Glu 


Ala 


Ala 


Leu 


Ala 


Asp 


Gin 


Gly 


Leu 


Leu 


Asp 


Met 


Ser 


145 










150 










155 








160 

Jl \J \j 


Leu 


Pro 


Val 


Leu 


Arg 


Ala 


Asp 


His 


Arg 


Val 


Leu 


Arg 


Ser 


Tyr 


Ala 


t rn 










165 










170 










175 


Gin 


Ala 


Gly 


Pro 


Pro 


Leu 


Arg 


Ala 


Gly 


lie 


Thr 


Thr 


Leu 


Cys 


Gly 


Asp 








180 










18S 












Thr 


Asp 


Pro 


Leu 


Thr 


Thr 


Val 


Glu 


Asp 


Ala 


Gin 


Arg 


Trp 


Leu 


Pro 


Tyr 






195 










200 










205 






Ser 


Val 


Val 


Pro 


Gly 


Arg 


Thr 


Arg 


Thr 


Phe 


Pro 


Gly 


Gly 


His 


Phe 


Tyr 




210 










215 










220 








Leu 


Ala 


Asp 


His 


Val 


Gly 


Glu 


Val 


Ala 


Glu 


Ser 


Val 


Ala 


Pro 


Asp 


Leu 


225 










230 










235 








240 


Leu 


Arg 


Leu 


Thr 


Pro 
245 


Thr 


Gly 
















• 





<210> 20 
<211> 189 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 20 

lie Arg Val Gin Asp Asp Asp Ala Asp Arg Leu Ser Arg Asp Glu Leu 

15 10 15 

Thr Ser lie Ala Leu Val Leu Leu Leu Ala Gly Phe Glu Ala Ser Val 

20 25 30 

Ser Leu He Gly He Gly Thr Tyr Leu Leu Leu Thr His Pro Asp Gin 

35 40 45 

Leu Ala Leu Val Arg Lys Asp Pro Ala Leu Leu Pro Gly Ala Val Glu 

50 55 60 

Glu He Leu Arg Tyr Gin Ala Pro Pro Glu Thr Thr Thr Arg Phe Ala 
65 70 75 80 

Thr Ala Glu Val Glu He Gly Gly Val Thr He Pro Ala Tyr Ser Thr 

85 90 95 

Val Leu He Ala Asn Gly Ala Ala Asn Arg Asp Pro Gly Gin Phe Pro 

100 105 110 

Asp Pro Asp Arg Phe Asp Val Thr Arg Asp Ser Arg Gly His Leu Thr 

115 120 125 

Phe Gly His Gly He His Tyr Cys Met Gly Arg Pro Leu Ala Lys Leu 

130 135 140 

Glu Gly Glu Val Ala Leu Gly Ala Leu Phe Asp Arg Phe Pro Lys Leu 
145 150 155 160 

Ser Leu Gly Phe Pro Ser Asp Glu Val Val Trp Arg Arg Ser Leu Leu 

165 170 175 

Leu Arg Gly He Asp His Leu Pro Val Arg Pro Asn Gly 

180 185 



51 



BNSDOCID: <WO 0127?84A2_I„> 



r 



WO 01/27284 PCT7US00/27433 



<210> 21 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic nucleotide DNA duplex 
<400> 21 

taagaattcg gagatctggc ctcagctcta gac 33 

<210> 22 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Complementary oligo 
<400> 22 

aattgtctag agctgaggcc agatctccga attcttaat - 39 

<210> 23 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 23 

ttgcagcggt tgtcggtggc ggtgcgggag gggcgtcggg tgttgggtgt ggtggtgggt 60 
tcggcggtga atcaggatgg ggcgagtaat gggttggcgg cgccgtcggg ggtggcgcag 120 
cagcgggtga 'ttcggcgggc gtggggtcgt gcgggtgtgt cgggtgggga tgtgggtgtg 180 
gtggaggcgc atgggacggg gacgcggttg ggggatccgg tggagttggg ggcgttgttg 240 
gggacgtatg gggtgggtcg gggtggggtg ggtccggtgg tggtgggttc ggtgaaggcg 300 
aatgtgggtc atgtgcaggc ggcggcgggt gtggtgggtg tgatcaaggt ggtgttgggg 360 
ttgggtcggg ggttggtggg tccgatggtg tgtcggggtg ggttgtcggg gttggtggat 420 
tggtcgtcgg gtgggttggt ggtggcggat ggggtgcggg ggtggccggt gggtgtggat 480 
ggggtgcgtc ggggtggggt gtcggcgttt ggggtgtcgg ggacgaat 528 

<210> 24 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 24 

ctgcagcggt tgtcggtggc ggtgcgggag gggcgtcggg tgttgggtgt ggtggtgggt 60 
tcggcggtga atcaggatgg ggcgagtaat gggttggcgg cgccgtcggg ggtggcgcag 120 
cagcgggtga ttcggcgggc gtggggtcgt gcgggtgtgt cgggtgggga tgtgggtgtg 180 
gtggaggcgc atgggacggg gacgcggttg ggggatccgg tggagttggg ggcgttgttg 240 
gggacgtatg gggtgggtcg gggtggggtg ggtccggtgg tggtgggttc ggtgaaggcg 300 
aatgtgggtc atgtgcaggc ggcggcgggt gtggtgggtg tgatcaaggt ggtgttgggg 360 
ttgggtcggg ggttggtggg tccgatggtg tgtcggggtg ggttgtcggg gttggtggat 420 
tggtcgtcgg gtgggttggt ggtggcggat ggggtgcggg ggtggccggt gggtgtggat 480 
' ggggtgcgtc ggggtggggt gtcggcgttt ggggtgtcgg ggacgaat 528 

<210> 25 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 
<220> 
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<221> misc^feature 
<222> (1) . . . (528) 

<223> Sequence with cocloa changes as described in the 

specification at page 99, line 22 thru 101, line 23 

<400> 25 

ctgcagcgcc tctccgtcgc cgtccgcgag ggccgccgag tcctcggcgt cgtcgtcggc 60 

tcggccgtca accaagacgg cgcgtcaaac ggcctcgccg cgccctccgg cgtcgcccag 120 

cagcgcgtca tacgccgcgc gtggggacgc gccggagtat cgggcggcga cgtcggagtc 180 

gtcgaggccc acggcaccgg cacccgcctc ggggatcccg tcgagctggg cgccctcctg 24 0 

ggcacgtacg gcgtcggccg cggcggcgtc ggcccggtcg tcgtcggcag cgtcaaggcc 300 

aacgtcggcc acgtccaggc cgcggccggc gtcgtcgggg tcatcaaggt cgtcctcggc 360 

ctcggccgcg ggctggtcgg cccgatggtc tgccgcggcg gcctcagcgg cctcgtcgac 420 

tggtcgtccg gcggcctggt cgtcgcggac ggggtccgcg gctggccggt cggcgtcgac 480 

ggcgtccgcc ggggcggcgt ctcggcgttc ggcgtcagcg ggacgaat 528 

<210> 26 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 

<400> 26 " ■ 

ggtggagtgt gatgcggtgg tgtcgtcggt ggtggggttt tcggtgttgg gggtgttgga 60 
gggtcggtcg ggtgcgccgt cgttggatcg ggtggatgtg gtgcagccgg tgttgttcgt 120 
ggtgatggtg tcgttggcgc ggttgtggcg gtggtgtggg gttgtgcctg cggcggtggt 180 
gggtcattcg cagggggaga tcgcggcggc ggtggtggcg ggggtgt.tgt cggtgggtga 240 
tggtgcgcgg gtggtggcgt tgcgggcgcg ggcgttgcgg gcgttggccg g 291 

<210> 27 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 27 

ggtggagtgt gatgcggtgg tgtcgtcggt ggtggggttt tcggtgttgg gggtgttgga 60 
gggtcggtcg ggtgcgccgt cgttggatcg ggtggatgtg gtgcagccgg tgttgttcgt 120 
ggtgatggtg tcgttggcgc ggttgtggcg gtggtgtggg gttgtgcctg cggcggtggt ' 180 
gggtcattcg cagggggaga tcgcggcggc ggtggtggcg ggggtgttgt cggtgggtga 240 
tggtgcgcgg gtggtggcgt tgcgggcgcg ggcgttgcgg gcgttggccg g 291 

<210> 28 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 
<220> 

<221> misc_feature 
<222> (1) . . . (291) 

<223> Sequence with codon changes as described in the 

specification at page 99, line 22 thru page 101, line 23 

<400> 28 

cgtggagtgc gatgcggtcg tgtcgagcgt cgtcggcttc agcgtgctgg gcgtcctgga 60 

gggccgcagc ggcgccccga gcctggaccg cgtcgacgtg gtccagccgg tcctgttcgt 120 

ggtcatggtc agcctggccc gcctgtggcg ctggtgcggc gtggtcccgg ccgccgtggt 180 

cggccacagc cagggcgaga tcgccgccgc ggtcgtggcc ggcgtcctga gcgtcggcga 240 

cggcgcccgc gtcgtggccc tgcgcgcccg cgccctgcgc gccctggccg g 291 

<210> 29 
<211> 24 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 29 

gaacaactcc tgtctgcggc cgcg 

<210> 30 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 30 

cggaattctc tagagtcacg tctccaaccg cttgtcgagg 

<210> 31 
<211> 51 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<40O> 31 

tctagactta attaaggagg acacatatga gcgagagcag cggcatgacc g 

<210> 32 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 32 

aacgcctccc aggagatctc cagca 

<210> 33 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligo 
<400> 33 

aattcatagc ctaggt 

<210> 34 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligo 
<400> 34 
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Title 

Recombinant Mcgalomicin Biosynthetic Genes And Uses Thereof 

Cross-Reference to Priority Application 
This application claims priority to provisional U.S. patent application 
Serial No. 60/158,305, filed 8 October 1999, and provisional U.S. patent 
application Serial No. 60/190,024, filed 17 March 2000 under 35 U.S.C. § 1 19(e). 
The content of the above referenced applications is incorporated herein by 
reference in its entirety. 



Field of the Invention 
The present invention provides recombinant methods and materials for 
producing polyketides by recombinant DNA technology. The invention relates to 
1 5 the fields of agriculture, animal husbandry, chemistry, medicinal chemistry, 
medicine, molecular biology, pharmacology, and veterinary technology. 



Background of the Invention 
Polyketides represent a large family of diverse compounds synthesized 

20 from 2-carbon units through a series of condensations and subsequent 

modifications. Polyketides occur in many types of organisms, including fungi and 
mycelial bacteria, in particular, the actinomycetes. There are a wide variety of 
polyketide structures, and the class of polyketides encompasses numerous 
compounds with diverse activities. Erythromycin, FK-506, FK-520, megalomicin. 

25 narbomycin, oleandomycin, picromycin, rapamycin, spinocyn, and tylosin are 
examples of such compounds. Given the difficulty in producing polyketide 
compounds by traditional chemical methodology, and the typically low production 
of polyketides in wild-type cells, there has been considerable interest in finding 
improved or alternate means to produce polyketide compounds. See PCT 

30 publication Nos. WO 93/13663; WO 95/08548; WO 96/40968; WO 97/02358; 
and WO 98/27203; United States Patent Nos. 4,874,748; 5,063,155; 5,098,837; 
5,149,639; 5,672,491; and 5,712,146; Fu ei aL 9 1994, Biochemistry 33: 9321- 
9326; McDaniel ei al. 9 1993, Science 262: 1546-1550; and Rohr, 1995, Angew. 

I 
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Chem. Int. Ed. Engl 34(S): 881-888, each of which is incorporated herein by 
reference. 

Polyketides are synthesized in nature by polyketide synthase (PKS) 
enzymes. These enzymes, which are complexes of multiple large proteins, are 
5 similar to the synthases that catalyze condensation of 2-carbon units in the 

biosynthesis of fatty acids. PKS enzymes are encoded by PKS genes that usually 
consist of three or more open reading frames (ORFs). Two major types of PKS 
enzymes are known; these differ in their composition and mode of synthesis. 
These two major types of PKS enzymes are commonly referred to as Type I or 
10 "modular" and Type II "iterative" PKS enzymes. 

Modular PKSs are responsible for producing a large number of 12-, 14-, 
and 16-membered macrolide antibiotics including erythromycin, megalomicin, 
methymycin, narbomycin, oleandomycin, picromycin, and tylosin. Each ORF of a 
modular PKS can comprise one, two, or more "modules" of ketosynthase activity, 
15 each module of which consists of at least two (if a loading module) and more 

typically three (for the simplest extender module) or more enzymatic activities or 
"domains." These large multifunctional enzymes (>300,000 kDa) catalyze the 
biosynthesis of polyketide macrolactones through multistep pathways involving 
decarboxylative condensations between acyl thioesters followed by cycles of 
20 varying B-carbon processing activities (see O'Hagan, D. The polyketide 

metabolites', E. Horwood: New York, 1991, incorporated herein by reference). 

During the past half decade, the study of modular PKS function and 
specificity has been greatly facilitated by the plasmid-based Streptomyces 
coelicolor expression system developed with the 6-deoxyerythronolide B (6-dEB) 
25 synthase (DEBS) genes (see Kao et aL 9 1994, Science, 265: 509*512, McDaniel et 
al, \993 y Science 262: 1546-1557, and U.S. Patent Nos. 5,672,491 and 
5,712,146, each of which is incorporated herein by reference). The advantages to 
this plasmid-based genetic system for DEBS are that it overcomes the tedious and 
limited techniques for manipulating the natural DEBS host organism, 
30 Saccharopolyspora erythraea, allows more facile construction of recombinant 
PKSs, and reduces the complexity of PKS analysis by providing a "clean" host 
background. This system also expedited construction of the first combinatorial 
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modular polyketide library in Streptomyces (see PCT publication No. WO 
98/493 1 5, incorporated herein by reference). 

The ability to control aspects of polyketide biosynthesis, such as monomer 
selection and degree of B-carbon processing, by genetic manipulation of PKSs has 
5 stimulated great interest in the combinatorial engineering of novel antibiotics (see 
Hutchinson, 1998, Curr. Opin. Microbiol 1: 319-329; Carrerasand Santi, 1998, 
Curr, Opin, Biotech. 9: 403-41 1; and U.S. Patent Nos. 5,712,146 and 5,672,491, 
each of which is incorporated herein by reference). This interest has resulted in the 
cloning, analysis, and manipulation by recombinant DNA technology of genes that 

10 encode PKS enzymes. The resulting technology allows one to manipulate a known 
PKS gene cluster either to produce the polyketide synthesized by that PKS at 
higher levels than occur in nature or in hosts that otherwise do not produce the 
polyketide. The technology also allows one to produce molecules that are 
structurally related to, but distinct from, the poiyketides produced from known 

1 5 PKS gene clusters. 

Megalomicin is a macrolide antibiotic produced by Micromonospora 
megalomicea, a member of the Actinomycetales family of soil bacteria that 
produces many types of biologically active compounds. Megalomicin is a 
glycoside of erythromycin A, a widely used antibacterial drug with little or no 

20 antimalarial activity. Megalomicin has antibacterial properties similar to those of 
erythromycin, and in 1998, it was discovered also to have potent antiparasitic 
activity and low toxicity. The antiparasitic activity may be related to the effect 
megalomicin has on protein trafficking in eukaryotes, where it appears to inhibit 
vesicular transport between the medial and trans-Golgi, resulting in under- 

25 sialylation of proteins. Hence, megalomicin offers an exciting opportunity to 
develop a new class of antiparasitic drugs with a different mechanism of action 
than the drugs currently in use and, therefore, possibly active against drug-resistant 
forms of Plasmodium falciparum. 

The number and diversity of megalomicin derivatives have been limited 

30 due to the inability to manipulate the PKS genes, which have not previously been 
available in recombinant form. Genetic systems that allow rapid engineering of the 
megalomicin biosynthetic genes would be valuable for creating novel compounds 
for pharmaceutical, agricultural, and veterinary applications. The production of 

-> 
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such compounds could be more readily accomplished if the heterologous 
expression of the megalomicin biosynthetic genes in Streptomyces coelicolor and 
S. lividans and other host cells were possible. The present invention meets these 
and other needs. 

5 

Summary of the Invention 
The present invention provides recombinant methods and materials for 
expressing PKS enzymes and polyketide modification enzymes derived in whole 
and in part from the megalomicin biosynthetic genes in recombinant host cells. 
10 The invention also provides the poiyketides produced by such PKS enzymes. The 
invention provides in recombinant form all of the genes for the proteins that 
constitute the complete PKS that ultimately results, in Micromonospora 
megalomicea, in the production of megalomicin. Thus, in one embodiment, the 
invention is directed to recombinant materials comprising nucleic acids with 
1 5 nucleotide sequences encoding at least one domain, module, or protein encoded by 
a megalomicin PKS gene. In one preferred embodiment of the invention, the DNA 
compounds of the invention comprise a coding sequence for at least one and 
preferably two or more of the domains of the loading module and extender 
modules 1 through 6, inclusive, of the megalomicin PKS. 
20 In one embodiment, the invention provides a recombinant expression 

vector that comprises a heterologous promoter positioned to drive expression of 
one or more of the megalomicin biosynthetic genes. In a preferred embodiment, 
the promoter is derived from another PKS gene. In a related embodiment, the 
invention provides recombinant host cells comprising one or more expression 
25 vectors that produce(s) megalomicin or a megalomicin derivative or precursor. In 
a preferred embodiment, the host cell is Streptomyces lividans or 5. coelicolor. 

In another embodiment, the invention provides a recombinant expression 
vector that comprises a promoter positioned to drive expression of a hybrid PKS 
comprising all or part of the megalomicin PKS and at least a part of a second PKS. 
30 In a related embodiment, the invention provides recombinant host cells 

comprising the vector that produces the hybrid PKS and its corresponding 
polyketide. In a preferred embodiment, the host cell is Streptomyces lividans or S. 
coelicolor. 

4 
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In a related embodiment, the invention provides recombinant materials for 
the production of libraries of polyketides wherein the polyketide members of the 
library are synthesized by hybrid PKS enzymes of the invention. The resulting 
polyketides can be further modified to convert them to other useful compounds, 
5 such as antibiotics, motilides, and antiparasitics, typically through hydroxylation 
and/or glycosylation. Modified macrolides provided by the invention that are 
useful intermediates in the preparation of antiparasitics are of particular benefit. 

In another related embodiment, the invention provides a method to prepare 
a nucleic acid that encodes a modified PKS, which method comprises using the 
10 megalomicin PKS encoding sequence as a scaffold and modifying the portions of 
the nucleotide sequence that encode enzymatic activities, either by mutagenesis, 
inactivation, deletion, insertion, or replacement. The thus modified megalomicin 
PKS encoding nucleotide sequence can then be expressed in a suitable host cell 
and the cell employed to produce a polyketide different from that produced by the 
1 5 megalomicin PKS. In addition, portions of the megalomicin PKS coding sequence 
can be inserted into other PKS coding sequences to modify the products thereof. 

In another related embodiment, the invention is directed to a multiplicity of 
cell colonies, constituting a library of colonies, wherein each colony of the library 
contains an expression vector for the production of a modular PKS derived in 
20 whole or in part from the megalomicin PKS. Thus, at least a portion of the 

modular PKS is identical to that found in the PKS that produces megalomicin and 
is identifiable as such. The derived portion can be prepared synthetically or 
directly from DNA derived from organisms that produce megalomicin. In 
addition, the invention provides methods to screen the resulting polyketide and 
25 antibiotic libraries. 

The invention also provides novel polyketides, motilides, antibiotics, 
antiparasitics and other useful compounds derived therefrom. The compounds of 
the invention can also be used in the manufacture of another compound. In a 
preferred embodiment, the compounds of the invention are formulated in a 
30 mixture or solution for administration to an animal or human. 

In a specific embodiment, the invention provides an isolated nucleic acid 
fragment comprising a nucleotide sequence encoding a domain of megalomicin 
polyketide synthase (PKS) or a megalomicin modification enzyme. The isolated 
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nucleic acid fragment can be a DNA or a RNA. Preferably, the isolated nucleic 
acid fragment is a recombinant DNA compound. 

The isolated nucleic acid fragment can comprise a single, multiple or all 
the open reading frame(s) (ORF) of the megalomicin PKS or a megalomicin 
5 modification enzyme. Exemplary ORFs of megalomicin PKS include the ORFs of 
the megAI, megAII and megAIII genes. The isolated nucleic acid fragment can 
also encode a single, multiple, or all of the domains of the megalomicin PKS. 
Exemplary domains of the megalomicin PKS include a TE domain, a KS domain, 
an AT domain, an ACP domain, a KR domain, a DH domain and an ER domain. 
10 In a preferred embodiment, the nucleic acid fragment encodes a module of the 
megalomicin PKS. In another preferred. embodiment, the nucleic acid fragment 
encodes the loading module, a thioesterase domain, and all six extender modules 
of the megalomicin PKS. 

, Megalomicin modification enzymes include those enzymes involved in the 
15 conversion of 6-dEB into a megalomicin such as the enzymes encoded by the 
megF, meg BV, megCIII, megK, megDl and megG (renamed megY) genes. 
Megalomicin modification enzymes also include those enzymes involved in the 
biosynthesis of mycarose, megosamine or desosamine, which are used as 
biosynthetic intermediates in the biosynthesis of various megalomicin species and 
20 other related polyketides. The enzymes that are involved in biosynthesis of 
mycarose, megosamine or desosamine are described in Figures 5 and 10. 

In a preferred embodiment, the invention provides an isolated nucleic acid 
fragment which hybridizes to a nucleic acid having a nucleotide sequence set forth 
in the SEQ. ID NO:l, under low, medium or high stringency. More preferably, the 
25 nucleic acid fragment comprises, consists or consists essentially of a nucleic acid 
having a nucleotide sequence set forth in the SEQ. ID NO:l . 

In another specific embodiment, the invention provides a substantially 
purified polypeptide, which is encoded by a nucleic acid fragment comprising a 
nucleotide sequence encoding a domain of megalomicin polyketide synthase 
30 (PKS) or a megalomicin modification enzyme. The polypeptide can comprise a 
single domain, multiple domains or a full-length megalomicin PKS or 
megalomicin modification enzyme. Functional fragments, analogs or derivatives 
of the megalomicin PKS or megalomicin modification enzyme polypeptides are 
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also provided. Preferably, such fragments, analogs or derivatives can be 
recognized by an antibody raised against a megalomicin PKS or megalomicin 
modification enzyme. Also preferably, such fragments, analogs or derivatives 
comprise an amino acid sequence that has at least 60% identity, more preferably at 
5 least 90% identity, to their wild type counterparts. 

In still another specific embodiment, the invention provides an antibody, or 
a fragment or derivative thereof, which immuno-specifically binds to a domain of 
megalomicin polyketide synthase (PKS) or a megalomicin modification enzyme. 
The antibody can be a monoclonal or polyclonal antibody or an antibody fragment. 
10 Preferably, the antibody is a monoclonal antibody. 

In yet another specific embodiment, the invention provides a recombinant 
DNA expression vector comprising the recombinant DNA compound encoding at 

4 

least a domain of the megalomicin PKS or a megalomicin modification enzyme, 
wherein said domain is operably linked to a promoter. Preferably, the 

1 5 recombinant DNA expression vector further comprises an origin of replication or a 
segment of DNA that enables chromosomal integration. 

In yet another specific embodiment, the invention provides a recombinant 
host cell comprising the above-described recombinant DNA expression vector 
encoding at least a domain of megalomicin PKS or the megalomicin modification 

20 enzyme. The recombinant host cells can be any suitable host cells including 
animal, mammalian, plant, fungal, yeast, and bacterial cells. Preferably, the 
recombinant host cells are Streptomyces cells : such as Streptomyces lividans and 
S. coelicolor cells, or ccharopolyspora cells, such as Saccharopolyspora erythraea 
cells. Also preferably, the recombinant host cells do not produce megalomicin in 

25 their untransformed, non-recombinant state. 

When the recombinant host cell contains nucleic acid encoding more than 
one megalomicin PKS or megalomicin modi fication enzyme, or domains thereof, 
such nucleic acid material can be located at a single genetic locus, e.g., on a single 
plasmid or at a single chromosomal locus, or at different genetic loci, e.g., on 

30 separate plasmids and/or chromosomal loci. In one example, the invention 
provides a recombinant host cell, which comprises at least two separate 
autonomously replicating recombinant DNA expression vectors, and each of said 
vectors comprises a recombinant DNA compound encoding a megalomicin PKS 

7 
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domain or a megalomicin modification enzyme operably linked to a promoter. In 
another example, the invention provides a recombinant host cell, which comprises 
at least one autonomously replicating recombinant DNA expression vector and at . 
least one modified chromosome, each of said vector(s) and each of said modified 
5 chromosome comprises a recombinant DNA compound encoding a megalomicin 
PKS domain or a megalomicin modification enzyme operably linked to a 
promoter. Preferably, the autonomously replicating recombinant DNA expression 
vector and/or the modified chromosome further comprises distinct selectable 
markers. 

10 In a preferred embodiment, the cell comprises three different vectors, one 

of which is integrated into the chromosome and two of which are autonomously 
replicating, and each of the vectors comprises a meg PKS gene. Optionally, one or 
more of the meg PKS genes contains one or more domain alterations, such as a 
deletion or substitution of a meg PKS domain with a domain from another PKS. 
1 5 In yet another specific embodiment, the invention provides a hybrid PKS, 

which is produced from a recombinant gene that comprises at least a portion of a 
megalomicin PKS gene and at least a portion of a second PKS gene for a 
polyketide other than megalomicin. For example, and without limitation, the 
second PKS gene can be a narbonolide PKS gene, an oleandolide PKS gene, or a 
20 rapamycin PKS gene. In one embodiment, the hybrid PKS is composed of a 

loading module and six extender modules, wherein at least one domain of any one 
of extender modules 1 through 6, inclusive, is a domain of an extender module of 
megalomicin PKS. In another preferred embodiment, the hybrid PKS comprises a 
megalomicin PKS that has a non-functional KS domain in module 1 . 
25 In yet another specific embodiment, the invention provides a method of 

producing a polyketide, which method comprises growing the recombinant host 
cell comprising a recombinant DNA expression vector encoding at least a domain 
of the megalomicin PKS or a megalomicin modification enzyme under conditions 
whereby the megalomicin PKS domain or the megalomicin modification enzyme 
30 comprised by the recombinant expression vector is produced and the polyketide is 
synthesized by the cell, and recovering the synthesized polyketide. Preferably, the 
recombinant host cell comprises a recombinant expression vector that encodes at 
least a portion of a megAI, megAH, or megAllI gene. 
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These and other embodiments of the invention are described in more detail 
in the following description, the examples, and claims set forth below. 

Brief Description of the Figures 
5 Figure I shows restriction site and function maps of the insert DNA in 

cosmids pKOS079-138B, pKOS079-93D 5 pKOS079-93A, and pKOS079-124B of 
the invention. Various restriction sites {Xhol, BgDl, Nsi\) are also shown. The 
location of the rhegalomicin biosynthetic genes is shown below the solid lines 
indicating the cosmid inserts. The genes are shown as arrows pointing in the 

10 direction of transcription. The approximate size (in kilobase (kb) pairs) of the gene 
cluster is indicated in 5000 bp (i.e., 5K, 10K, and the like/) increments on a solid 
bar beneath the arrows indicating the genes. 

Figure 2 shows a more detailed map of the megalomicin biosynthetic gene 
cluster. The various open reading frames are shown as arrows pointing in the 

1 5 direction of transcription. A line indicates the size in base pairs (in 1000 bp 

increments) of the gene cluster. The various domains of the megalomicin PKS are 
also shown. Other genes of the megalomicin biosynthetic gene cluster not shown 
in this Figure are located in the insert DNA of cosmids pKOS0138B and 
pKOS0124B. 

20 Figure 3 shows the structures of the megalomicins, azithromycin and 

erythromycin A. 

Figure 4 shows the modules and domains of DEBS and the megalomicin 

PKS. 

Figure 5 shows the compounds and reactions in the erythromycin 
25 biosynthetic pathway and also for megalomicin biosynthesis. Genes that produce 
the various enzymes that catalyze each of the steps in the biosynthetic pathway are 
indicated. 

Figure 6 shows the biosynthetic pathway for the formation of desosamine, 
rhodosamine, and mycarose, as well as the genes that produce the various enzymes 
30 that catalyze each of the steps in the biosynthetic pathway. 

Figure 7 depicts nucleotide and amino acid sequence of Micromonospora 
megalomicea megalomicin biosynthetic genes (GenBank Accession No. 
AF263245, incorporated herein by reference). 
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Figure 8 depicts the biosynthesis of the erythromycins and megalomicins 
and the enzymes that mediate the biosynthesis of each. 

Figure 9 depicts the cloned megalomicin biosynthetic gene cluster and 
certain cosmids of the invention that comprise portions of the cluster. 
5 Figure 1 0 depicts the biosynthesis of megosamine, mycarose, and 

desosamine. 

Detailed Description of the Invention 
The present invention provides useful compounds and methods for 
10 producing polyketides in recombinant host cells. As used herein, the term 
recombinant refers to a compound or composition produced by human 
intervention. The invention provides recombinant DNA compounds encoding all 
or a portion of the megalomicin biosynthetic genes. The invention provides 
recombinant expression vectors useful in producing the megalomicin PKS and 
15 hybrid PKSs composed of a portion of the megalomicin PKS in recombinant host 
cells. The invention also provides the polyketides produced by the recombinant 
PKS and polyketide modification enzymes. 

To appreciate the many and diverse benefits and applications of the 
invention, the description of the invention below is organized as follows. In 
20 Section I, common definitions used throughout this application are provided. In 
Section II, structural and functional characteristics of megalomicin are described. 
In Section III, the recombinant megalomicin biosynthetic genes and other 
recombinant nucleic acids provided by the invention are described. In Section IV, 
polypeptides and proteins encoded by the megalomicin biosynthetic genes and 
25 antibodies that specifically bind to such polypeptides and proteins provided by the 
invention are described. In Section V, methods for heterologous expression of the 
megalomicin biosynthetic genes provided by the invention are described. In 
Section VI, the hybrid PKS genes provided by the invention are described. In 
Section VII- host cells containing multiple megalomicin biosynthetic genes and 
30 nucleic acid fragments on separate express vectors provided by the invention are 
described. In Section VIII, the polyketide compounds provided by the invention 
and pharmaceutical compositions of those compounds are described. The detailed 
description is followed by working examples illustrating the invention. 

10 
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Unless defined otherwise, all technical and scientific terms used herein 
have the same meaning as is commonly understood by one of ordinary skill in the 
art to which this invention belongs. All patents, applications, published 
applications and other publications and sequences from GenBank and other data 
5 bases referred to herein are incorporated by reference in their entirety. 

Section I. Definitions 

As used herein, domain refers to a portion of a molecule, e.g., proteins or 
nucleic acids, that is structurally and/or functionally distinct from another portion 
10 of the molecule. 

As used herein, antibody includes antibody fragments, such as Fab 
fragments, which are composed of a light chain and the variable region of a heavy 
chain. 

As used herein, biological activity refers to the in vivo activities of a 
1 5 compound or physiological responses that result upon in vivo administration of a 
compound, composition or other mixture. Biological activity, thus, encompasses 
therapeutic effects and pharmaceutical activity of such compounds, compositions 
and mixtures. Biological activities may be observed in in vitro systems designed 
to test or use such activities. 
20 As used herein, a combination refers to any association between two or 

among more items. 

As used herein, a composition refers to any mixture. It may be a solution, 
a suspension, liquid, powder, a paste, aqueous, non-aqueous or any combination 
thereof. 

25 As used herein, derivative or analog of a molecule refers to a portion 

derived from or a modified version of the molecule. 

As used herein, operably linked, operatively linked or operationally 
associated refers to the functional relationship of DNA with regulatory and 
effector sequences of nucleotides, such as promoters, enhancers, transcriptional 

30 and translational stop sites, and other signal sequences. For example, operative 
linkage of DNA to a promoter refers to the physical and functional relationship 
between the DNA and the promoter such that the transcription of such DNA is 
initiated from the promoter by an RNA polymerase that specifically recognizes, 

1 1 
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binds to and transcribes the DNA. To optimize expression and/of in vitro 
transcription, it may be helpful to remove, add or alter 5* untranslated portions of 
the clones to eliminate extra, potentially inappropriate alternative translation 
initiation (i.e., start) codons or other sequences that may interfere with or reduce 
5 expression, either at the level of transcription or translation. Alternatively, 

consensus ribosome binding sites (see, e.g., Kozak, J. Biol Chem.>?66A9%67- 
19870 (1991)) can be inserted immediately 5* of the start codon and may enhance 
expression. The desirability of (or need for) such modification may be empirically 
determined. 

1 o As used herein, pharmaceutical^ acceptable salts, esters or other 

derivatives of the conjugates include any salts, esters or derivatives that may be 
readily prepared by those of skill in this art using known methods for such 
derivatization and that produce compounds that may be administered to animals or 
humans without substantial toxic effects and that either are phaimaceutically 

15 active or are prodrugs. 

As used herein, a promoter region or promoter element refers to a segment 
of DNA or RNA that controls transcription of the DNA or RNA to which it is 
operatively linked. The promoter region includes specific sequences that are 
sufficient for RNA polymerase recognition, binding and transcription initiation. 
20 This portion of the promoter region is referred to as the promoter. In addition, the 
promoter region includes sequences that modulate this recognition, binding and 
transcription initiation activity of RNA polymerase. These sequences may be cis 
acting or may be responsive to trans acting factors. Promoters, depending upon 
the nature of the regulation, may be constitutive or regulated. 
25 As used herein: stringency of hybridization in determining percentage 

mismatch is as follows: (I) high stringency: 0.1 x SSPE, 0.1% SDS, 65°C; (2) 
medium stringency: 0.2 x SSPE, 0.1% SDS, 50°C; and (3) low stringency: 1.0 x 
SSPE, 0.1% SDS, 50°C. Equivalent stringencies may be achieved using alternative 
buffers, salts and temperatures. 
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The term substantially identical or homologous or similar varies with the 
context as understood by those skilled in the relevant art and generally means at 
least 70%, preferably means at least 80%, more preferably at least 90%, and most 
preferably at least 95% identity. 
5 As used herein, substantially identical to a product means sufficiently 

similar so that the property of interest is sufficiently unchanged so that the 
substantially identical product can be used in place of the product. 

As used herein, isolated means that a substance is either present in a 
preparation at a concentration higher than that substance is found in nature or in its 

10 naturally occurring state or that the substance is present in a preparation that 

contains other materials with which the substance is not associated with in nature. 
As an example of the latter, an isolated meg PKS protein includes a meg PKS 
protein expressed in a Streptomyces coelicolor or S. lividans host cell. 

As used herein, substantially pure means sufficiently homogeneous to 

15 appear free of readily detectable impurities as determined by standard methods of 
analysis, such as thin layer chromatography (TLC), gel electrophoresis and high 
performance liquid chromatography (HPLC), used by those of skill in the art to 
assess such purity, or sufficiently pure such that further purification would not 
detectably alter the physical and chemical properties, such as enzymatic and 

20 biological activities, of the substance. Methods for purification of the compounds 
to produce substantially chemically pure compounds are known to those of skill in 
the art. A substantially chemically pure compound may, however, be a mixture of 
stereoisomers or isomers. In such instances, further purification might increase 
the specific activity of the compound. 
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As used herein, vector or plasmid refers to discrete elements that are used 
to introduce heterologous DNA into cells for either expression or replication 
thereof. Selection and use of such vehicles are well known within the skill of the 
artisan. An expression vector includes vectors capable of expressing DNAs that 

5 are operatively linked with regulatory sequences, such as promoter regions, that 
are capable of effecting expression of such DNA fragments. Thus, an expression 
vector refers to a recombinant DNA or RNA construct, such as a plasmid, a phage, 
recombinant virus or other vector that, upon introduction into an appropriate host 
cell, results in expression of the cloned DNA. Appropriate expression vectors are 

1 0 well known to those of skill in the art and include those that are replicable in 

eukaryotic cells and/or prokaryotic cells and those that remain episomal or those 
which integrate into the host cell genome. 

Section II. Megalomicins 
1 5 The megalomicins were discovered in 1 969 at Schering Corp. as 

antibacterial agents produced by Micromonospora megalomicea (see Weinstein ei 
aL> 1969, J. Antibiotics 22: 253-258, and U.S. Patent No. 3,632,750, both of 
which are incorporated herein by reference). Although the initial structural 
assignment was in error, a thorough reassessment of NMR data coupled with an 
20 X-ray crystal structure of a megalomicin A derivative (see Nakagawa and Omura, 
"Structure and Stereochemistry of Macrolides" in Macrolide Antibiotics (S. 
Omura, ed.), Academic Press, NY, 1984, incorporated herein by reference) 
established the structures shown in Figure 3. The megalomicins are 6-0- 
glycosides of erythromycin C with acetyl or propionyl groups esterified at the 3"' 
25 or 4" 5 hydroxyls of the mycarose sugar at the C-3-position. The C-6 sugar has 

been named "megosamine," although it had been identified 5 to 10 years earlier as 
L-rhodosamine or Mdimethyldaunosamine, deoxyamino sugars commonly present 
in the anthracycline antitumor drugs. The antibacterial potency, spectrum of 
activity, and toxicity (LD 5 o acute, 7-7.5 g/kg s.c. or oral; subacute, >500 mg/kg) of 
30 the megalomicins is similar to that of erythromycin A. 

The megalomicins have two modes of biological activity. As antibacterials, 
they act like the erythromycins, which inhibit protein synthesis at the translocation 
step by selective binding to the bacterial 50S ribosomal RNA. They also affect 
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protein trafficking in eukaryotic cells (see Bonay et aL, 1996, J. Biol. Chem. 
27 J :37 19-3726, incorporated herein by reference). Although the mechanism of 
action is not entirely clear, it appears to involve inhibition of vesicular transport 
between the medial and trans Golgi, resulting in under-sialylation of proteins. The 
5 megalomicins also strongly inhibit the ATP-dependent acidification of lysosomes 
in vivo (see Bonay et al, 1997, J. Cell ScL 770:1 839-1849, incorporated herein by 
reference) and cause an anomalous glycosylation of viral proteins, which may be 
responsible for their antiviral activity against herpes (T0X50, 70-100 ^iM; see 
Alarcon et aL, 1984, Antivir. Res. 4:231-243, and Alarcon et al. 3 1988, FEES Lett. 

10 2J/:207-21 1, both of which are incorporated herein by reference). 

Strikingly, the megalomicins are potent antiparasitic agents, showing an 
IC50 of 1 |ig/ml in blocking intracellular replication of Plasmodium falciparum 
infected erythrocytes (see Bonay et aL y 1998, Antimicrob. Agents Chemother. 
42:2668-2673, incorporated herein by reference). The megalomicins are effective 

1 5 against Trypanosoma cruzi and T. brucei (IC50, 0.2-2 pg/ml) plus Leishmania 
donovani and L. major promastigotes (ICso, 3 and 8 pg/ml, respectively). 
Megalomicin is also active against the intracellular replicative, amastigote form of 
T. cruzi, completely preventing its replication in infected murine LLC/MK2 
macrophages at a dose of 5 ng/ml. Importantly, the effective drug concentration is 

20 500-fold less than the acute LD50 in mammals, and there is no toxicity to BALB/c 
mice at doses (50 mg/kg) that are completely curative for T. brucei infections. 
Because the erythromycins do not have such activity, although azithromycin 
(Figure 3) has been reported to be an effective acute and prophylactic treatment for 
malaria caused by P. vivax and P \ falciparum (see Taylor et al., 1999, Clin. Infect. 

25 Dis. 28:74-81, incorporated herein by reference), the antiparasitic action of the 
megalomicins is unique and probably related to the presence of the deoxyamino 
sugar megosamine at C-6 (Figure 3). Consequently, the megalomicins could be 
developed into potent antimalarial drugs with a high therapeutic index and be 
active against P. falciparum and other species that are resistant to currently used 

30 classes of antimalarials. They also could lead to potent antiparasitic agents against 
leishmaniasis, trypanosomiasis, and Chagas' disease. In view of the widespread 
use of the erythromycins and their good oral availability plus the low mammalian 
toxicity of macrolides in general, the megalomicins could be used prophylactically 
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to combat malaria, and as fermentation products, the megalomicins should be 
relatively inexpensive to produce. 

The megalomicins belong to the polyketide class of natural products whose 
members have diverse structural and pharmacological properties (see Monaghan 
5 and Tkacz, 1990, Annu. Rev. Microbiol. 44: 271, incorporated herein by 

reference). The megalomicins are assembled by polyketide synthases through 
successive condensations of activated coenzyme- A thioester monomers derived 
from small organic acids such as acetate, propionate, and butyrate. Active sites 
required for condensation include an acyltransferase (AT), acyl carrier protein 
10 (ACP), and beta-ketoacylsynthase (KS). Each condensation cycle results in a 6- 
keto group that undergoes all, some, or none of a series of processing activities. 
Active sites that perform these reactions include a ketoreductase (KR), 
dehydratase (DH), and enoylreductase (ER). Thus, the absence of any beta-keto 
processing domain results in the presence of a ketone, a KR alone gives rise to a 
1 5 hydroxyl, a KR and DH result in an alkene, while a KR, DH, and ER combination 
leads to complete reduction to an alkane. After assembly of the polyketide chain, 
the molecule typically undergoes cyclization(s) and post-PKS modification (e.g. 
glycosylation, oxidation, acylation) to achieve the final active compound. 

Macrolides such as erythromycin and megalomicin are synthesized by 
20 modular PKSs (see Cane et al, 1998, Science 282: 63, incorporated herein by 
reference). For illustrative purposes, the PKS that produces the erythromycin 
polyketide (6-deoxyerythronolide B synthase or DEBS; see U.S. Patent No. 
5,824,513, incorporated herein by reference) is shown in Figure 4. DEBS is the 
most characterized and extensively used modular PKS system. DEBS is 
25 particularly relevant to the present invention in that it synthesizes the same 

polyketide, 6-deoxyerythronolide B (6-dEB), synthesized by the megalomicin 
PKS. In modular PKS enzymes such as DEBS and the megalomicin PKS, the 
enzymatic steps for each round of condensation and reduction are encoded within 
a single "module" of the polypeptide (i.e., one distinct module for every 
30 condensation cycle). DEBS consists of a loading module and 6 extender modules 
and a chain terminating thioesterase (TE) domain within three extremely large 
polypeptides encoded by three open reading frames (ORFs, designated eryAI, 
eryAII y and eryAIII). 
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Each of the three polypeptide subunits of DEBS (DEBSI, DEBSII, and 
DEBSIH) contains 2 extender modules, DEBSI additionally contains the loading 
module. Collectively, these proteins catalyze the condensation and appropriate 
reduction of 1 propionyl CoA starter unit and 6 methylmalonyl CoA extender 

5 units. Modules 1, 2, 5, and 6 contain KR domains; module 4 contains a complete 
set, KR/DH/ER, of reductive and dehydratase domains; and module 3 contains no 
functional reductive domain. Following the condensation and appropriate 
dehydration and reduction reactions, the enzyme bound intermediate is lactonized 
by the TE at the end of extender module 6 to form 6-dEB. 

10 More particularly, the loading module of DEBS consists of two domains, 

an acyUtransferase (AT) domain and an acyl carrier protein (ACP) domain. In 
other PKS enzymes, the loading module is not composed of an AT and an ACP 
but instead utilizes an inactivated KS, an AT, and an ACP. This inactivated KS is 
in most instances called KS Q , where the superscript letter is the abbreviation for 

1 5 the amino acid, glutamine, that is present instead of the active site cysteine 
required for activity. The AT domain of the loading module recognizes a 
particular acyl-CoA (propionyl for DEBS, which can also accept acetyl) and 
transfers it as a thiol ester to the ACP of the loading module. Concurrently, the AT 
on each of the extender modules recognizes a particular extender-Co A 

20 (methylmalonyl for DEBS) and transfers it to the ACP of that module to form a 

thioester. Once the PKS is primed with acyl- and malonyl-ACPs, the acyl group of 
the loading module migrates to form a thiol ester (trans-esterification) at the KS of 
the first extender module; at this stage, extender module 1 possesses an acyl-KS 
and a methylmalonyl ACP. The acyl group derived from the loading module is 

25 then covalently attached to the alpha-carbon of the malonyl group to form a 

carbon-carbon bond, driven by concomitant decarboxylation, and generating a new 
acyl-ACP that has a backbone two carbons longer than the loading unit 
(elongation or extension). The growing polyketide chain is transferred from the 
ACP to the KS of the next module, and the process continues. 

30 . The polyketide chain, growing by two carbons each module, is sequentially 

passed as a covalently bound thiol ester from module to module, in an assembly 
line-like process. The carbon chain produced by this process alone would possess 
a ketone at every other carbon atom, producing a polyketone, from which the 
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name polyketide arises. Commonly, however, the beta keto group of each two- 
carbon unit is modified just after it has been added to the growing polyketide 
chain but before it is transferred to the next module by either a KR, a KR plus a 
DH, or a KR, a DH, and an ER. As noted above, modules may contain additional 
5 enzymatic activities as well. 

Once a polyketide chain traverses the final extender module of a PKS, it 
encounters the releasing domain or thioesterase found at the carboxyl end of most 
PKSs. Here, the polyketide is cleaved from the enzyme and cyclyzed. The 
resulting polyketide can be modified further by tailoring or modification enzymes; 
1 0 these enzymes add carbohydrate groups or methyl groups, or make other 

modifications, i.e., oxidation or reduction, on the polyketide core molecule. For 
example, the final steps in conversion of 6-dEB to erythromycin A include the 
actions of a number of modification enzymes, such as: C-6 hydroxylation, 
attachment of mycarose and desosamine sugars, C-12 hydroxylation (which 
1 5 produces erythromycin C), and conversion of mycarose to cladinose via 0- 
methylation, as shown in Figure 5. 

With this overview of PKS and post-PKS modification enzymes, one can 
better appreciate the recombinant megalomicin biosynthetic genes provided by the 
invention and their function, as described in the following Section. 

20 

Section HI: The Megalomicin Biosynthetic Genes and Nucleic Acid Fragments 

The megalomicin PKS was isolated and cloned by the following 
procedure. Genomic DNA was isolated from a megalomicin producing strain of 
Micromonospora megalomicea subsp. nigra (ATCC 27598), partially digested 

25 with a restriction enzyme, and cloned into a commercially available cosmid vector 
to produce a genomic library. This library was then probed with probe generated 
from the erythromycin biosynthetic genes as well as from cosmids identified as 
containing sequences homologous to erythromycin biosynthetic genes. This 
probing identified a set of cosmids, which were analyzed by DNA sequence 

30 analysis and restriction enzyme digestion, which revealed that the desired DNA 
had been isolated and that the entire PKS gene cluster was contained in 
overlapping segments on four of the cosmids identified. Figure 1 shows the 
cosmids, and the portions of the megalomicin biosynthetic gene cluster in the 
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insert DNA of the cosmids. Figure 1 shows that the complete megalomicin 
biosynthetic gene cluster is contained within the insert DNA of cosmids 
pKOS079-138B, pKOS079-l24B, pKOS079-93D, and pKOS079-93A. Each of 
these cosmids has been deposited with the American Type Culture Collection in 
5 accordance with the terms of the Budapest Treaty . (cosmid pKOS079-138B is 

available under accession no. ATCC ; cosmid pKOS079-124B is available 

under accession no. ATCC ; cosmid pKOS079-93D.is available under 

accession no. ATCC ; and cosmid pKOS079-93A is available under 

accession no. ATCC ). Various additional reagents of the invention can be 

1 0 isolated from these cosmids. DNA sequence analysis was also performed on the 
various subclones of the invention, as described herein. Further analysis of these 
cosmids and subclones prepared from the cosmids facilitated the identification of 
the location of various megalomicin biosynthetic genes, including the ORFs 
encoding the PKS, modules encoded by those ORFs, and coding sequences for 

15 megalomicin modification enzymes. The location of these genes and modules is 
shown on Figure 2. 

Those of skill in the art will recognize that, due to the degenerate nature of 
the genetic code, a variety of DNA compounds differing in their nucleotide 
sequences can be used to encode a given amino acid sequence of the invention. 

20 The native DNA sequence encoding the megalomicin PKS and other biosynthetic 
enzymes and other biosynthetic enzymes of Micromonospora megalomicea is 
shown herein merely to illustrate a preferred embodiment of the invention, and the 
invention includes DNA compounds of any sequence that encode the amino acid 
sequences of the polypeptides and proteins of the invention. In similar fashion, a 

25 polypeptide can typically tolerate one or more amino acid substitutions, deletions, 
and insertions in its amino acid sequence without loss or significant loss of a 
desired activity. The present invention includes such polypeptides with alternate 
amino acid sequences, and the amino acid sequences encoded by the DNA 
sequences shown herein merely illustrate preferred embodiments of the invention. 

30 The recombinant nucleic acids, proteins, and peptides of the invention are 

many and diverse. To facilitate an understanding of the invention and the diverse 
compounds and methods provided thereby, the following description of the 
various regions of the megalomicin PKS and the megalomicin modification 
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enzymes and corresponding coding sequences is provided. To facilitate description 
of the invention, reference to a PKS, protein, module, or domain herein can also 
refer to DNA compounds comprising coding sequences therefor and vice versa. 
Also, unless otherwise indicated, reference to a heterologous PKS refers to a PKS 
5 or DNA compounds comprising coding sequences therefor from an organism 
other than Micromonospora megalomicea. In addition, reference to a PKS or its 
coding sequence includes reference to any portion thereof 

Thus, the invention provides DNA molecules in isolated (i.e., not pure, but 
existing in a preparation in an abundance and/or concentration not found in nature) 
10 and purified (i.e., substantially free of contaminating materials or substantially free 
of materials with which the coiTesponding DNA would be found in nature) form. 
The DNA molecules of the invention comprise one or more sequences that encode ■ 
one or more domains (or fragments of such domains) of one or more modules in 
one or more of the ORFs of the megalomicin PKS and sequences that encode 
1 5 megalomicin modification enzymes from the megalomicin biosynthetic gene 

cluster. Examples of PKS domains include the KS, AT, DH, KR, ER, ACP, and 
TE domains of at least one of the 6 extender modules and loading module of the 
three proteins encoded by the three ORFs of the megalomicin PKS gene cluster. 
Examples of megalomicin modification enzymes include those that synthesize the 
20 mycarose, desosamine, and megosamine moieties, those that transfer those sugar 
moieties to the polyketide 6-dEB, those that hydroxylate the polyketide at C-6 and 
C-12, and those that acylate the sugar moieties. 

In an especially preferred embodiment, the DNA molecule is a 
recombinant DNA expression vector or plasmid, as described in more detail in the 
25 following Section. Generally, such vectors can either replicate in the cytoplasm of 
the host cell or integrate into the chromosomal DNA of the host cell. In either 
case, the vector can be a stable vector (i.e., the vector remains present over many 
celt divisions, even if only with selective pressure) or a transient vector (i.e., the 
vector is gradually lost by host cells with increasing numbers of cell divisions). 
30 The megalomicin PKS gene cluster comprises three ORFs (megAI, megAII, 

and megAIII). Each ORF encodes two extender modules of the PKS; the first ORF 
also encodes the loading module. Each extender module is composed of at least a 
KS, an AT, and an ACP domain. The locations of the various encoding regions of 
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these ORPs are shown in Figure 2 and described with reference to the sequence 
information below. The megalomicin PKS produces the polyketide known as 6- 
dEB, shown in Figure 4. In megalomicin-producing organisms, 6-dEB is 
converted to erythromycin C by a set of modification enzymes. Thus, 6-dEB is 
5 converted to erythronolide B by the megF gene product (a homolog of the eryF 
gene product), then to 3-alpha-mycarosyl-erythronolide B by the megBV gene 
product (a homolog of the eryBV gene product), then to erythromycin D by the 
megCIII gene product (a homolog of the eryCIH gene product, then to 
erythromycin C by the megK gene product (a homolog of the eryK gene product). 

10 In addition to these modification enzymes, such megalomicin-producing 

organisms also contain the modification enzymes necessary for the biosynthesis of 
the desosamine and mycarose moieties that are similarly utilized in erythromycin 
biosynthesis, as shown in Figure 5. Megalomicin A contains the complete 
erythromycin C structure, and its biosynthesis additionally involves the formation 

15 of L-megosamine (L-rhodosamine) and its attachment to the C-6 hydroxyl 
(Figures 3 and 5, inset), followed by acylation of the C-3' >5 and(or) C-4' s5 
hydroxyls as the terminal steps. L-megosamine is the same as /^-dimethyl -L- 
daunosamine; the daunosamine genes have been characterized from Streptomyces 
pence tins (see Colombo and Hutchinson,./. Indus t. Microbiol BiotechnoL, in 

20 press; Otten et aL, 1996, J Bacteriol 178:7316-7321, and references cited therein). 
Some of the rhodosamine genes also have been cloned and partially characterized 
from another anthracycline producing Streptomyces sp. (see Torkkell et al., 1 997, 
Mol Gen, Genet, 25<5(2):203-209). Because the timing of the glycosylation with 
TDP-megosamine in relation to the addition of mycarose and desosamine to 

25 erythronolide B, plus the C-12 hydroxy lation, is unknown, the pathway could 
involve a different order of glycosylation and C-12 hydroxylation steps than the 
one shown in Figure 5. Regardless, the megalomicin biosynthetic gene cluster 
contains the genes to make L-rhodosamine and attach it to the correct macrolide 
substrate. 

30 The biosynthetic pathways to make the glycosides desosamine, mycarose, 

and megosamine are shown in Figure 6. The present invention provides the genes 
for each biosynthetic pathway shown in this Figure, and these recombinant genetic 
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pathways can be used alone or in any combination to confer the pathway to a 

heterologous host. . 

The megalomicin PICS locus is similar to the eryA locus in size and 
organization. Most of the deoxysugar biosynthesis genes are homoiogs of the eryB 

5 mycarose and eryC desosamine biosynthesis and glycosyl attachment genes from 
Saccharopolyspora erythraea (see Summers et aL, 1997, Microbiol. 743:3251- 
3262; Haydock et aL, 1991, Mol Gen. Genet. 230:120-128; Gaisser et aL, 1997, 
Mol Gen Genet, 256:239-251; Gaisser et aL, 1998, Mol Gen GeneL 257:78-88, 
incorporated herein by reference) or the picC homoiogs from the picromycin and 

1 0 narbomycin producer (see PCT patent publication No. 99/61 599 and Xue et al., 
1998, Proc. Nat. Acad Sci. USA 95, 121 1 1-121 16, incorporated herein by 
reference). The TDP-megosamine biosynthesis genes are homoiogs of the dnm 
genes (see Figure 5) and the pikromycin N-dimethyltransferase gene or its 
homoiogs reported in a cluster of L-rhodosainine biosynthesis genes. The putative 

1 5 TDP-megosamine gly cosy transferase gene product (geneX in Figure 5) closely 
resembles the deduced products of the eryBV , eryCIJI, dnmS, and pikromycin 
desVIl genes, even though it recognizes different substrates than the products of 

each of these genes. 

The following Table 1 shows the location of the genes in the 
20 Micromonospora megalomicea megalomicin biosynthetic pathway in the DNA 
sequence set forth in SEQ ID NOT (see also Figure 7; note some gene 
designations maybe different in Figure 7). 

Table 1 . Megalomicin Biosynthetic Gene Cluster 
25 Micromonospora megalomicea subsp. nigra (ATCC27598) 

Location Description 

1 .2451 sequence from cosmid pKOS079-138B 

complement^ ..1 44) megBVI (or megT), TDP-4-keto-6-deoxyglucose- 

30 2,3-dehydratase 

928..2061 megDVl, TDP-4-keto-6-deoxyglucose 3,4-isomerase 

2072. .3382 megDI, TDP-megosaminyl transferase {eryCIII 

homolog) 

2452. .40397 sequence of cosmid pKOS079-93D 

35 3462..4634 megG(or megY), mycarosyl acyltransferase 

465 1 ..5775 megDI I, deoxysugar transaminase (eryCI, DnrJ 

homolog) 
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10 



15 



20 



25 



30 



35 



40 



45 



5S22..6595 

d imethyl transferase 

6592..7197 

7220.. 8206 
dnmV 

complement(8228..9220) 



megDIII, TDP-daunosaminyl-N,N- 
{eryCVI homolog) 

megDIV, TDP-4-keto-6-deoxyglucose 3,5-epimerase 

{eryB VII, dnm U homolog) 

'megDV, TDP-hexose 4-ketoreductase {eiyBIV, 

homolog) 



nexose 2,3-reductase 




compiement(yzzo. . 1 U4 /y) 


megBV, TDP-mycarosyl transferase 


complement^ iU4oj>.. 1 1424) 


megBIV, TDP-hexose 4-ketoreductase 


lit Ol O^OO 1 

1 21 oL. 22821 


A T 

megAI 


izioi..i3/yi 


Loading Module (L) 


I ZjUj,. 1 j*t /U 


ATT 


1 i <o*/c 1 nrki 


ACP-L 


l jo4y.. i ozU / 


bxtender Module 1 (1) 


i ->o4y.. l j i 2o 


KS1 


1 D4Z /.. 1 u4 /o 


Al I 


i / 1 ->!>.. l /oy4 


IvKl 


I / yH /.. 1 oZU / 


ACr 1 


lO/Oo..ZZJ /D 


bxtenaer Module 2 (2) 


1 1 0^/19 
i 0ZO6.. l 


CO 

JvoZ 


19876 ?0910 


AT? 


21517 220S3 


IvJVZ 


Z- J 1 0..i.6J / J 


rVUr z 


22867 DISS'S 


/Q ATT 
TriegsilJ 


22957 2725K 


oxiciiticr iviuuuic ^ ^ j» j 


?2957 24237 


IVOJ 


24544 25581 


AT3 


26230 26733 


iVi\_/ ^IJidOll VCy 


?6998 27258 


ACP3 


27313..33312 


Extender Module 4 T 4^ 


27393..28590 


KS4 


28897..29931 


AT4 


29953..30477 


DH4 


31 396..32244 


ER4 


322S7..32799 


KR4 


33052..33312 


ACP4 


33666..43271 


tnegAIH 


-33780..38120 - 


Extender Module 5 (5) 


33780..35027 


KS5 


35385..36419 


AT5 


37068..37604 


KJR.5 


37860..38120 


ACP5 


38187..42425 


Extender Module 6 (6) 


38187..39470 


KS6 


39795..40811 


AT6 


40398..46641 


sequences from cosmid pK.OS079-93A 
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41406..41936 KR6 
42168..42425 ACP6 
42585. .43271 TE 

43268..44344 megCII> TDP-4-keto-6-deoxyglucose 3,4-isomerase 

5 443 5 5.. 45 623 megCIIJ, TDP-desosaminyl transferase 

45620..4659 1 megBII, TDP-4-keto-6-deoxy-L-glucose 2,3 

dehydratase 

complement(46660.. 47403) megH, TE1I 
complement(474 1 1 ..47980) megF, C-6 hydroxylase 

10 

In a specific embodiment, the invention provides an isolated nucleic acid 
fragment comprising a nucleotide sequence encoding a domain of the 
megalomicin poiyketide synthase or a megalomicin modification enzyme. The 
isolated nucleic acid fragment can be a DNA or a RNA. Preferably, the isolated 
15 nucleic acid fragment is a recombinant DNA compound. A nucleotide sequence 
that is complementary to the nucleotide sequence encoding a domain of 
megalomicin PKS or a megalomicin modification enzyme is also provided. 

The isolated nucleic acid fragment can comprise a single, multiple or all 
the open reading frame(s) (ORP) of the megalomicin PKS or the megalomicin 
20 modification enzyme. Exemplary ORFs of megalomicin PKS include the ORFs of 
the megAI, megAII and megAIIl genes. The isolated nucleic acids of the invention 
also include nucleic acids that encode one or more domains and one or more 
modules of the megalomicin PKS. Exemplary domains of the megalomicin PKS 
include a TE domain, a KS domain, an AT domain, an ACP domain, a KR 
25 domain, a DH domain and an ER domain. In a preferred embodiment, the nucleic 
acid comprises the coding sequence for a loading module, a thioesterase domain, 
and all six extender modules of the megalomicin PKS. 

Megalomicin modification enzymes include those enzymes involved in the 
conversion of 6-DEB into a megalomicin such as the enzymes encoded by megF, 
30 meg BV 9 megCIU, megK, megDI and megG (or megY). Megalomicin modification 
enzymes also include those enzymes involved in the biosynthesis of mycarose, 
megosamine or desosamine, which are used as biosynthetic intermediates in the 
biosynthesis of various megalomicin species and other related polyketides. The 
enzymes that are involved in biosynthesis of mycarose, megosamine or 
35 desosamine are described in Figures 5 and 10. The megalomicin PKS and 

megalomicin modification enzymes are collectively referred to as megalomicin 
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biosynthetic enzymes; the genes encoding such enzymes are collectively referred 
to as megalomicin biosynthetic genes; and nucleic acids that comprise a portion of 
or entire megalomicin biosynthetic genes are collectively referred to as 
megalomicin biosynthetic nucleic acid(s). 

* 

5 In specific embodiments, the megalomicin biosynthetic nucleic acids 

comprise the sequence of SEQ ID NO: I , or the coding regions thereof, or 
nucleotide sequences encoding, in whole or in part, a megalomicin biosynthetic 
enzyme protein. The isolated nucleic acids typically consists of at least 25 
(continuous) nucleotides, 50 nucleotides, 100 nucleotides, 150 nucleotides, or 200 

1 0 nucleotides of megalomicin biosynthetic nucleic acid sequence, or a full-length 
megalomicin biosynthetic coding sequence. In another embodiment, the nucleic 
acids are smaller than 35, 200, or 500 nucleotides in length. Nucleic acids can be 
single or double. stranded. Nucleic acids that hybridize to or are complementary to 
the foregoing sequences, in particular the inverse complement to nucleic acids that 

1 5 hybridize to the foregoing sequences (i.e., the inverse complement of a nucleic 
acid strand has the complementary sequence running in reverse orientation to the 
strand so that the inverse complement would hybridize without mismatches to the 
nucleic acid strand) are also provided. In specific aspects, nucleic acids are 
provided which comprise a sequence complementary to (specifically are the 

2<> inverse complement of) at least 10, 25, 50, 100, or 200 nucleotides or the entire 
coding region of a megalomicin biosynthetic gene. 

The megalomicin biosynthetic nucleic acids provided herein include those 
with nucleotide sequences encoding substantially the same amino acid sequences 
as found in native megalomicin biosynthetic enzyme proteins, and those encoding 

25 amino acid sequences with functionally equivalent amino acids, as well as 

megalomicin biosynthetic enzyme derivatives or analogs as described in Section 
IV. 

Some regions within the megalomicin PKS genes are highly homologous 
or identical to one another, as can be readily identified by an analysis of the 
30 sequence. The coding sequence for the KS and AT domains of module 2 shares 
significant identity with the coding sequence for the KS and AT domains of 
module .6. This sequence homology or identity at the nucleic acid, e.g., DNA, level 
can render the nucleic acid unstable in certain host cells. To improve the stability 
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of the nucleic acids comprising a portion or the entire megalomicin PKS genes and 
megalomicin modification enzyme genes, the nucleic acid or DNA sequences can 
be changed to reduce or abolish the sequence homology or identity. Preferably, 
the DNA codons of homologous regions within the PKS or the megalomicin 

5 modification enzyme coding sequence are changed to reduce or abolish the 
sequence homology or identity without changing the amino acid sequences 
encoded by said changed DNA codons (see the examples below). The stability of 
the nucleic acid or DNA can also be improved by codon changes that reduce or 
abolish the sequence homology or identity while also changing the amino acid 

10 sequence, provided that the amino acid sequence change(s) does not substantially 
change the desired activity of the encoded megalomicin PKS. Thus, for example, 
one can simply substitute for the megAIJJ ORF an ORF from eryAIII, oleAHI, 

4 

picAIII, or picAlV genes. 

The recombinant DNA compounds of the invention that encode the 

15 megalomicin PKS and modification proteins or portions thereof are useful in a 

variety of applications. While many of these applications relate to the heterologous 
expression of the megalomicin biosynthetic genes or the construction of hybrid 
PKS enzymes, many useful applications invplve the natural megalomicin producer 
Micromonospora megalomicea. For example, one can use the recombinant DNA 

20 compounds of the invention to disrupt the megalomicin biosynthetic genes by 

homologous recombination in Micromonospora megalomicea. The resulting host 
cell is a preferred host cell for making polyketides modified by oxidation, 
hydroxylation, glycosylation, and acylation in a manner similar to megalomicin, 
because the genes that encode the proteins that perform these reactions are of 

25 course present in the host cell, and because the host cell does not produce 

megalomicin that could interfere with production or purification of the polyketide 
of interest. 

One illustrative recombinant host cell provided by the present invention 
expresses a recombinant megalomicin PKS in which the module 1 KS domain is 
30 inactivated by deletion or other mutation. In a preferred embodiment, the 

inactivation is mediated by a change in the KS domain that renders it incapable of 
binding substrate (called a KS1° mutation). In a particularly preferred 
embodiment, this inactivation is rendered by a mutation in the codon for the active 
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site cysteine that changes the codon to another codon, such as an alanine codon. 
Such constructs are especially useful when placed in translational reading frame 
with extender modules 1 and 2 of a megalomicin or the corresponding modules of 
another PICS. The utility of these constructs is that host cells expressing, or cell 
5 free extracts containing, a PKS comprising the protein encoded thereby can be fed 
or supplied with N-acylcysteamine thioesters of precursor molecules to prepare a 
polyketide of interest. See U.S. patent application Serial No. 09/492,773, filed 27 
Jan. 2000, and PCT patent publication No. 00/44717, both of which are 
incorporated herein by reference. Such KS1° constructs of the invention are useful 

10 in the production of I3-substituted-megalomicin compounds in Micromonospora 
megalomicea host cells. Preferred compounds of the invention include those 
compounds in which the substituent at the 13-position is propyl, vinyl, propargyl, 
other lower alkyl, and substituted alkyl. 

In a variant of this embodiment, one can employ a megalomicin PKS in 

1 5 which the ACP domain of module 1 has been rendered inactive. In another 
embodiment, one can delete the loading domain of the megalomicin PKS and 
provide monoketide substrates for processing by the remainder of the PKS. 

The compounds of the invention can also be used to construct recombinant 
host cells of the invention in which coding sequences for one or more domains or 

20 modules of the megalomicin PKS or for another megalomicin biosynthetic gene 
have been deleted by homologous recombination with the Micromonospora 
megalomicea chromosomal DNA. Those of skill in the art will appreciate that the 
compounds used in the recombination process are characterized by their homology 
with the chromosomal DNA and not by encoding a functional protein due to their 

25 intended function of deleting or otherwise altering portions of chromosomal DNA. 
For this and a variety of other applications, the compounds of the present 
invention include not only those DNA compounds that encode functional proteins 
but also those DNA compounds that are complementary or identical to any portion 
of the megalomicin biosynthetic genes. 

30 Thus, the invention provides a variety of modified Micromonospora 

megalomicea host cells in which one or more of the megalomicin biosynthetic 
genes have been mutated or disrupted. Transformation systems for M. 
megalomicea have been described by Hasegawa et aL, 1991, J. Bacterial 
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/ 75:7004-1 1 ; and Takada el aL % 1 994, J. Antibiot, 47: 11 67- 1 1 70, both of which 
are incorporated herein by reference. These cells are especially useful when it is 
desired to replace the disrupted function with a gene product expressed by a 
recombinant DNA expression vector. While such expression vectors of the 

5 invention are described in more detail in the following Section, those of skill in 
the art will appreciate that the vectors have application to M. megalomicea as well. 
Such M megalomicea host cells can be preferred host cells for expressing 
megalomicin derivatives of the invention. Particularly preferred host cells of this 
type include those in which the coding sequence for the loading module has been 

1 0 mutated or disrupted, those in which one or more of any of the PKS gene ORFs 
has been mutated or disrupted, and/or those in which the genes for one or more 
modification (glycosylation, acylation, hydroxylation) have been mutated or 
disrupted. 

While the present invention provides many useful compounds having 
15 application to, and recombinant host cells derived from, Micromonospora 

megalomicea, many important applications of the present invention relate to the 
heterologous expression of all or a portion of the megalomicin biosynthetic genes 
in cells other than M megalomicea, as described in Section V. 

20 Section IV: The Megalomicin Biosynthetic Enzymes and Antibodies Recognizing 
such Enzymes 

In another specific embodiment, the invention provides a substantially 
purified polypeptide, which is encoded by a nucleic acid fragment comprising a 
nucleotide sequence encoding a domain of megalomicin polyketide synthase 

25 (PKS) or a megalomicin modification enzyme. The polypeptide can comprise a 
single domain, multiple domains or a full-length megalomicin PKS or 
megalomicin modification enzyme. Functional fragments, analogs or derivatives 
of the megalomicin PKS or megalomicin modification enzyme polypeptides are 
also provided. Preferably, such fragments, analogs or derivatives can be 

30 recognized an antibody raised against a megalomicin PKS or megalomicin 

modification enzyme. Also preferably, such fragments, analogs or derivatives 
comprise an amino acid sequence that has at least 60% identity, more preferably at 
least 90% identity to their wild type counterparts. 
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An exemplary nucleotide sequence encoding, and the corresponding amino 
acid sequence of, a megalomicin biosynthetic enzyme is disclosed in SEQ ID 
NO: 1 . Hornologs (e.g., nucleic acids of the above-listed genes of species other 
than Micromonospora megalomiced) or other related sequences (e.g., paralogs) 
5 can be obtained by low, moderate or high stringency hybridization with all or a 
portion of the particular sequence provided as a probe using methods well known 
in the art for nucleic acid hybridization and cloning (e.g., as described in Section 
III) in accordance with the methods of the present invention. 

The megalomicin biosynthetic enzyme proteins, or domains thereof, of the 

10 present invention can be obtained by methods well known in the art for protein 
purification and recombinant protein expression in accordance with the methods 
of the present invention. For recombinant expression of one or more of the 
proteins, the nucleic acid containing all or a portion of the nucleotide sequence 
encoding the protein can be inserted into an appropriate expression vector, i.e., a 

1 5 vector that contains the necessary elements for the transcription and translation of 
the inserted protein coding sequence. Transcriptional and translational signals can 
be supplied by the native promoter for a megalomicin biosynthetic gene and/or 
flanking regions. 

A variety of host- vector systems may be utilized to express the protein 
20 coding sequence. These include but are not limited to mammalian cell systems 
infected with virus (e.g. vaccinia virus, adenovirus, and the like); insect cell 
systems infected with virus (e.g. baculovirus); microorganisms such as yeast 
containing yeast vectors; or bacteria transformed with bacteriophage, DNA, 
plasmid DNA, or cosmid DNA. The expression elements of vectors vary in their 
25 properties. Depending on the host-vector system utilized, any one of a number of 
suitable transcription and translation elements may be used. 

In a specific embodiment, a vector is used that comprises a promoter 
operably linked to nucleic acid sequences encoding a megalomicin biosynthetic' 
enzyme, or a domain, fragment, derivative or homolog, thereof, one or more 
30 origins of replication, and optionally, one or more selectable markers (e.g., an 
antibiotic resistance gene). 

* 

Expression vectors containing the sequences of interest can be identified 
by three general approaches: (a) nucleic acid hybridization, (b) presence or 
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absence of "marker" gene function, and (c) expression of the inserted sequences. 
In the first approach, megalomicin biosynthetic nucleic acid sequences can be 
detected by nucleic acid hybridization to probes comprising sequences 
homologous and complementary to the inserted sequences. In the second 
5 approach, the recombinant vector/host system can be identified and selected based 
upon the presence or absence of certain "marker" functions (e.g., binding to an 
anti-megaiomicin biosynthetic enzyme antibody, resistance to antibiotics, 
occlusion body formation in baculovirus, and the like) caused by insertion of the 
sequences of interest in the vector. For example, if a megalomicin biosynthetic 
10 gene, or portion thereof, is inserted within the marker gene sequence of the vector, 
recombinants containing the megalomicin biosynthetic gene fragment will be 
identified by the absence of the marker gene function. In the third approach, 
recombinant expression vectors can be identified by assaying for the megalomicin 
biosynthetic gene products expressed by the recombinant vector. Such assays can 
15 be based, for example, on the physical or functional properties of the interacting 
species in in vitro assay systems, e.g., megalomicin synthesis activity, 
immunoreactivity to antibodies specific for the protein. 

Once recombinant megalomicin biosynthetic genes or nucleic acids are 
identified, several methods known in the art can be used to propagate them in 
20 accordance with the methods of the present invention. Once a suitable host 
system and growth conditions have been established, recombinant expression 
vectors can be propagated and amplified in quantity. As previously described, the 
expression vectors or derivatives which can be used include, but are not limited to: 
human or animal viruses such as vaccinia virus or adenovirus; insect viruses such 
25 as baculovirus, yeast vectors; bacteriophage vectors such as lambda phage; and 
plasmid and cosmid vectors. 

In addition, a host cell strain may be chosen that modulates the expression 
of the inserted sequences, or modifies or processes the expressed proteins in the 
specific fashion desired. Expression from certain promoters can be elevated in the 
30 presence of certain inducers; thus expression of the genetically-engineered 

megalomicin biosynthetic enzymes may be controlled. Furthermore, different host 
cells have characteristic and specific mechanisms for the translational and post- 
radiational processing and modification (e.g. glycosylation, phosphorylation, and 
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the like) of proteins. Appropriate cell lines or host systems can be chosen to 
ensure the desired modification and processing of the foreign protein is achieved. 
For example, expression in a bacterial system can be used to produce an 
unglycosylated core protein, while expression in mammalian cells ensures 
5 "native" glycosylation of a heterologous protein. Furthermore, different 

vector/host expression systems may effect processing reactions to different extent. 

In particular, megalomicin biosynthetic enzyme derivatives can be made by 
altering their sequences by substitutions, additions or deletions that provide for 
functionally equivalent molecules. Due to the degeneracy of nucleotide coding 

10 sequences, other DNA sequences which encode substantially the same amino acid 
sequence as an megalomicin biosynthetic gene can be used in the practice of the 
present invention. These include but are not limited to nucleotide sequences 
comprising all or portions of megalomicin biosynthetic genes that are altered by 
the substitution of different codons that encode the amino acid residue within the 

15 sequence, thus producing a silent change. Likewise, the megalomicin biosynthetic 
enzyme derivatives of the invention include, but are not limited to, those 
containing, as a primary amino acid sequence, all or part of the amino acid 
sequence of megalomicin biosynthetic enzymes, including altered sequences in 
which functionally equivalent amino acid residues are substituted for residues 

20 within the sequence resulting in a silent change. For example, one or more amino 
acid residues within the sequence can be substituted by another amino acid of a 
similar polarity which acts as a functional equivalent, resulting in a silent 
alteration. Substitutes for an amino acid within the sequence may be selected 
from other members of the class to which the amino acid belongs. For example, 

25 the nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, 
valine, proline, phenylalanine, tryptophan and methionine. The polar neutral 
amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and 
glutamine. The positively charged (basic) amino acids include arginine, lysine and 
histidine. The negatively charged (acidic) amino acids include aspartic acid and 

30 glutamic acid. 

In a specific embodiment of the invention, the nucleic acids encoding 
proteins and proteins consisting of or comprising a domain or a fragment of 
megalomicin biosynthetic enzyme consisting of at least 6 (continuous) amino 
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acids are provided. In other embodiments, the domain or fragment consists of at 
least 10, 20, 30, 40, or 50 amino acids of a megalomicin biosynthetic enzyme. In 
specific embodiments, such domains or fragments are not larger than 35, 100 or 
200 amino acids. Derivatives .or analogs of megalomicin biosynthetic enzyme 
5 include but are not limited to molecules comprising regions that are substantially 
homologous to megalomicin biosynthetic enzyme in various embodiments, at least 
30%, 40%, 50%, 60%, 70%, 80%, 90% or 95% identity over an amino acid 
sequence of identical sizeor when compared to an aligned sequence in which the 
alignment is done by a computer homology program known in the art in 
1 0 accordance with the methods of the present invention or whose encoding nucleic 
acid is capable of hybridizing to a sequence encoding a megalomicin biosynthetic 
enzyme under stringent, moderately stringent, or nonstringent conditions. 

The megalomicin biosynthetic enzyme domains, derivatives and analogs of 
the invention can be produced by various methods known in the art in accordance 
1 5 with the methods of the present invention. The manipulations which result in their 
production can occur at the gene or protein level. For example, the cloned 
megalomicin biosynthetic gene sequence can be modified by any of numerous 
strategies known in the art (Sambrook et aL, 1990, Molecular Cloning, A 
Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, 
20 New York) in accordance with the methods of the present invention. The 

sequences can be cleaved at appropriate sites with restriction endonuclease(s), 
followed by further enzymatic modification if desired, isolated, and ligated in 
vitro. 

Additionally, the megalomicin biosynthetic enzyme-encoding nucleotide 
25 sequence can be mutated in vitro or in vivo 9 to create and/or destroy translation, 
initiation, and/or termination sequences, or to create variations in coding regions 
and/or form new restriction endonuclease sites or destroy pre-existing ones, to 
facilitate further in vitro modification. Any technique for mutagenesis known in 
the art can be used in accordance with the methods of the present invention, 
30 including but not limited to, chemical mutagenesis and in vitro site-directed 
mutagenesis (Hutchinson et a!., J. Biol Chem. 253:6551-6558 (1978)), use of 
TAB® linkers (Pharmacia), and the like. 
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Once a recombinant cell expressing a megalomicin biosynthetic enzyme 
protein, or a domain, fragment or derivative thereof, is identified, the individual 
gene product can be isolated and analyzed. This is achieved by assays based on 
the physical and/or functional properties of the protein, including, but not limited 
5 to, radioactive labeling of the product followed by analysis by gel electrophoresis, 
immunoassay, cross-linking to marker-labeled product, and the like. 

The megalomicin biosynthetic enzyme proteins may be isolated and 
purified by standard methods known in the art or recombinant host cells 
expressing the complexes or proteins in accordance with the methods of the 

10 invention, including but not restricted to column chromatography (e.g., ion 

exchange, affinity, gel exclusion, reversed-phase high pressure, fast protein liquid, 
and the like), differential centrifugation, differentialsolubility, or by any other 
standard technique used for the purification of proteins. Functional properties 
may be evaluated using any suitable assay known in the art in accordance with the 

1 5 methods of the present invention. 

Alternatively, once a megalomicin biosynthetic enzyme or its domain or 
derivative is identified, the amino acid sequence of the protein can be deduced 
from the nucleotide sequence of the gene which encodes it. As a result, the 
protein or its domain or derivative can be synthesized by standard chemical 

20 methods known in the art in accordance with the methods of the present invention 
(see Hunkapiller et al, Nature 310:105-1 1 1 (1984)). 

Manipulations of megalomicin biosynthetic enzymes may be made at the 
protein level. Included within the scope of the invention are megalomicin 
biosynthetic enzyme domains, derivatives or analogs or fragments, which are 

25 differentially modified during or after translation, e.g., by glycosylation, 
acetylation, phosphorylation, amidation, derivatization by known 
protecting/blocking groups, proteolytic cleavage, linkage to an antibody molecule 
or other cellular ligand, and the like. Any of numerous chemical modifications 
may be carried out by known techniques, including but not limited to specific 

30 chemical cleavage by cyanogen bromide, trypsin, chymotrypsin, papain, V8 
protease, NaBFU, acetylation, formylation, oxidation, reduction, metabolic 
synthesis in the presence of tunicamycin, and the like. 
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In specific embodiments, the megalomicin biosynthetic enzymes are 
modified to include a fluorescent label. In other specific embodiments, the 
megalomicin biosynthetic enzyme is modified to have a heterofiinctional reagent, 
such heterofiinctional reagents can be used to crosslink the members of the 
5 complex. 

In addition, domains, analogs and derivatives of a megalomicin 

4 

biosynthetic enzyme can be chemically synthesized. For example, a peptide 
corresponding to a portion of a megalomicin biosynthetic enzyme, which 
comprises the desired domain or which mediates the desired activity in vitro can 
1 0 be synthesized by use of a peptide synthesizer. Furthermore, if desired, 

nonclassical amino acids or chemical amino acid analogs-can be introduced as a 
substitution or addition into the megalomicin biosynthetic enzyme sequence. 
Non-classical amino acids include but are not limited to the D-isomers of the 
common amino acids, alpha-amino isobutyric acid, 4-aminobutyric acid, 
1 5 2-aminobutyric acid, 6-amino hexanoic acid, Aib, 2-amtno isobutyric acid, 
3 -amino propionoic acid, ornithine, norleucine, norvaline, hydroxyproline, 
sarcosine, citrulline, cysteic acid, t-butyl glycine, t-butylalanine 5 phenylglycine, 
cyclohexylalanine, B-alanine, fluoro-amino acids, designer amino acids such as B- 
methyl amino- acids, Ca-methyl amino acids, Na-methyl amino acids, and amino 
20 acid analogs in general. Furthermore, the amino acid can be D (dextrorotary) or L 

(levorotary). 

In cases where natural products are suspected of being mutant or are 
isolated from new species, the amino acid sequence of the megalomicin 
biosynthetic enzyme isolated from the natural source, as well as those expressed in 
25 vitro, or from synthesized expression vectors in vivo or in vitro, can be determined 
from analysis of the DNA sequence, or alternatively, by direct sequencing of the 
isolated protein. Such analysis may be performed by manual sequencing or 
through use of an automated amino acid sequenator. 

The megalomicin biosynthetic enzyme proteins may also be analyzed by 
30 hydrophilicity analysis (Hopp and Woods, Proc. Natl. Acad. ScL USA 78:3824- 
3828 (1981)). A hydrophilicity profile can be used to identify the hydrophobic 
and hydrophilic regions of the proteins, and help predict their orientation in 
designing substrates for experimental manipulation, such as in binding 
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experiments, antibody synthesis, and the like. Secondary structural analysis can 
also be done to identify regions of the megalomicin biosynthetic enzyme that 
assume specific structures (Chou and Fasman, Biochemistry 13 :222-23 (1974)). 
Manipulation, translation, secondary structure prediction, hydrophilicity and 
5 hydrophobicity profiles, open reading frame prediction and plotting, and 

determination of sequence homologies, can be accomplished using computer 
software programs available in the art. 

Other methods of structural analysis including but not limited to X-ray 
crystallography (Engstrom, Biochem. Exp. Biol. 1J_:7-13 (1974)), mass 

10 spectroscopy and gas chromatography (Methods in Protein Science, J. Wiley and 
Sons, New York, 1 997), and computer modeling (Fletteriek and Zoller, eds., 1986, 
Computer Graphics and Molecular Modeling, In; Current Communications in 
Molecular Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, 
New York) can also be employed. 

1 5 The invention also provides an antibody, or a fragment or derivative 

thereof, which immuno-specifically binds to a domain of megalomicin polyketide 
synthase (PKS) or a megalomicin modification enzyme. In a specific 
embodiment, an antibody which immuno-specifically binds to a domain of the 
megalomicin biosynthetic enzyme encoded by a nucleic acid that hybridizes to a 

20 nucleic acid having the nucleotide sequence set forth in the SEQ. ID NO; 1 , or a 
fragment or derivative of said antibody containing the binding domain thereof is 
provided. Preferably, the antibody is a monoclonal antibody. 

The megalomicin biosynthetic enzyme protein and domains, fragments, 
homologs and derivatives thereof may be used as immunogens to generate 

25 antibodies which immunospecifically bind such immunogens. Such antibodies 
include but are not limited to polyclonal, monoclonal, chimeric, single chain, Fab 
fragments, and an Fab expression library. 

Various procedures known in the art may be used for the production of 
polyclonal antibodies to a megalomicin biosynthetic enzyme protein of the 

30 invention, its domains, derivatives, fragments or analogs in accordance with the 
methods of the present invention. 

For production of the antibody, various host animals can be immunized by 
injection with the native megalomicin biosynthetic enzyme protein or a synthetic 

35 

BNSDOCID: <WO 0127284A3_IA> 



WO 01/27284 PCT/US00/27433 

version, or a derivative of the foregoing, such as a cross-linked megalomicin 
biosynthetic enzyme. Such host animals include but are not limited to rabbits, 
mice, rats, and the like. Various adjuvants can be used to increase the 
immunological response, depending on the host species, and include but are not 
5 limited to Freund's (complete and incomplete), mineral gels such as aluminum 
hydroxide, surface active substances such as lysolecithin, pluronic polyols, 
polyanions, peptides, oil emulsions, dinitrophenol, and potentially useful human 
adjuvants such as bacille Calmette-Guerin (BCG) and corynebacterium parvum. 

For preparation of monoclonal antibodies directed towards a megalomicin 
10 biosynthetic enzyme or domains, derivatives, fragments or analogs thereof, any 
technique that provides for the production of antibody molecules by continuous 
cell lines in culture may be used. Such techniques include but are not restricted to 
the hybridoma technique originally developed by Kohler and Milstein {Nature 
256:495-497 (1975)), the trioma technique, the human B-cell hybridoma technique 
1 5 (Kozbor et al., Immunology Today 4:72 (1983)), and the EBV hybridoma 

technique to produce human monoclonal antibodies (Cole et aL, in Monoclonal 
Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96 (1985)). In an 
additional embodiment, monoclonal antibodies can be produced in germ-free 
animals (W089/ 12690). Human antibodies may be used and can be obtained by 
20 using human hybridomas (Cote et al., Proa Natl Acad. Set USA 80:2026-2030 
(1983)) or by transforming human B cells with EBV virus in vitro (Cole et al., in 
Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96 
(1985)). Techniques developed for the production of "chimeric antibodies" 
(Morrison et al., Proc. Natl Acad. Sci. USA 8 1 :685 1-6855 (1 984); Neuberger et 
25 • al. } Nature 312:604-608 (1984); Takeda et al., Nature 314:452-454 (1985)) by 
splicing the genes from a mouse antibody molecule specific for the megalomicin 
biosynthetic enzyme protein together with genes from a human antibody molecule 
of appropriate biological activity can be used; such antibodies are within the scope 
of this invention. 

30 Techniques described for the production of single chain antibodies (U.S. 

patent 4,946,778) can be adapted to produce megalomicin biosynthetic enzyme- 
specific single chain antibodies. An additional embodiment utilizes the techniques 
described for the construction of Fab expression libraries (Huse et al., Science 
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246 :1275-1281 (1989)) to allow rapid and easy identification of monoclonal Fab 
fragments with the desired specificity for megalomicin biosynthetic enzyme, or 
domains, derivatives, or analogs thereof. Non-human antibodies can be 
"humanized" by known methods (see, e.g., U.S. Patent No. 5,225,539). 
5 Antibody fragments that contain the idiotypes of a megalomicin 

biosynthetic enzyme can be generated by techniques known in the art in 
accordance with the methods of the present invention. For example, such 
fragments include but are not limited to: the F(ab')2 fragment which can be 
produced by pepsin digestion of the antibody molecule; the Fab* fragments that 

10 can be generated by reducing the'disulfide bridges of the F(ab')2 fragment, the Fab 
fragments that can be generated by treating the antibody molecular with papain 
and a reducing agent, and Fv fragments. 

In the production of antibodies, screening for the desired antibody can be 
accomplished by techniques known in the art in accordance with the methods of 

1 5 the present invention, e.g., EUSA (enzyme-linked immunosorbent assay). To 
select antibodies specific to a particular domain of the megalomicin biosynthetic 
enzyme, one may assay generated hybridomas for a product that binds to the 
fragment of a megalomicin biosynthetic enzyme that contains such a domain. 

The foregoing antibodies can be used in methods known in the art relating 

20 to the localization and/or quantitation of megalomicin biosynthetic enzyme 

proteins, e.g., for imaging these proteins or measuring levels thereof in samples, in 
accordance with the methods of the present invention. 

Section V: Heterologous Expression of the Megalomicin Biosynthetic Genes 
25 In one important embodiment, the invention provides methods for the 

heterologous expression of one or more of the megalomicin biosynthetic genes 
and recombinant DNA expression vectors useful in the method. For purposes of 
the invention, any host cell other than Micromonospord megalomicea is a 
heterologous host cell. Thus, included within the scope of the invention in 
30 addition to isolated nucleic acids encoding domains, modules, or proteins of the 
megalomicin PKS and modification enzymes, are recombinant expression vectors 
that include such nucleic acids. The term expression vector refers to a nucteic acid 
that can be introduced into a host cell or cell-free transcription and translation 
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system. An expression vector can be maintained permanently or transiently in a 
cell, whether as part of the chromosomal or other DNA in the cell or in any 
cellular compartment, such as a replicating vector in the cytoplasm. An expression 
vector also comprises a promoter that drives expression of an RNA, which 
5 typically is translated into a polypeptide in the cell or cell extract. For efficient 
translation of RNA into protein, the expression vector also typically contains a 
ribosome-binding site sequence positioned upstream of the start codon of the 
coding sequence of the gene to be expressed. Other elements, such as enhancers, 
secretion signal sequences, transcription termination sequences, and one or more 
1 0 marker genes by which host cells containing the vector can be identified and/or 
selected, may also be present in an expression vector. Selectable markers, i.e., 
genes that confer antibiotic resistance or sensitivity, are preferred and confer a 
selectable phenotype on transformed cells when the cells are grown in an 
appropriate selective medium. 
1 5 The various components of an expression vector can vary widely, 

depending on the intended use of the vector and the host cell(s) in which the 
vector is intended to replicate or drive expression. Expression vector components 
suitable for the expression of genes and maintenance of vectors in E. coli, yeast, 
Streptomyces, and other commonly used cells are widely known and commercially 
20 available. For example, suitable promoters for inclusion in the expression vectors 
of the invention include those that function in eucaryotic or procaryotic host cells. 
Promoters can comprise regulatory sequences that allow for regulation of 
expression relative to the growth of the host cell or that cause the expression of a 
gene to be turned on or off in response to a chemical or physical stimulus. For E. 
25 coli and certain other bacterial host cells, promoters derived from genes for 
biosynthetic enzymes, antibiotic-resistance conferring enzymes, and phage 
proteins can be used and include, for example, the galactose, lactose (lac), 
maltose, tryptophan (trp), bet a- lactamase (bla\ bacteriophage lambda PL, and T5 
promoters. In addition, synthetic promoters, such as the tac promoter (U.S. Patent 
30 No. 4,55 1 ,433), can also be used. 

Thus, recombinant expression vectors contain at least one expression 
system, which, in turn, is composed of at least a portion of the megalomicin PKS 
and/or other megalomicin biosynthetic gene coding sequences operably linked to a 
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promoter and optionally termination sequences that operate to effect expression of 
the coding sequence in compatible host cells. The host cells are modified by 
transformation with the recombinant DNA expression vectors of the invention to 
contain the expression system sequences either as extrachromosomal elements or 

5 integrated into the chromosome. The resulting host cells of the invention are 

useful in methods to produce PKS and post-PKS modification enzymes as well as 
polyketides and antibiotics and other useful compounds derived therefrom. 

Preferred host cells for purposes of selecting vector components for 
expression vectors of the present invention include fungal host cells such as yeast 

0 and procaryotic host cells such as E. coli and Streptomyces, but mammalian host 
cells can also be used. In hosts such as yeasts, plants, or mammalian cells that 
ordinarily do not produce polyketides, it may be necessary to provide, also 
typically by recombinant means, suitable holo-ACP synthases to convert the 
recombinantly produced PKS to functionality. Provision of such enzymes is 

5 described, for example, in PCT publication Nos. WO 97/13845 and 98/27203, 
each of which is incorporated herein by reference. Particularly preferred host cells 
for purposes of the present invention are Streptomyces and Saccharopolyspora 
host cells, as discussed in greater detail below. 

In a preferred embodiment, the expression vectors of the invention are 

0 used to construct a heterologous recombinant Streptomyces host cell that expresses 
a recombinant PKS of the invention. Streptomyces is a convenient host for 
expressing polyketides, because polyketides are naturally produced in certain 
Streptomyces species, and Streptomyces cells generally produce the precursors 
needed to form the desired polyketide. Those of skill in the art will recognize that, 

5 i f a Streptomyces host cell produces any portion of a PKS enzyme or produces a 
polyketide modification enzyme, the recombinant vector need drive expression of 
only those genes constituting the remainder of the desired PKS enzyme or other 
polyketide-modifying enzymes. Thus, such a vector may comprise only a single 
ORF, with the desired remainder of the polypeptides constituting the PKS 

0 provided by the genes on the host cell chromosomal DNA. 

If a Streptomyces or other host cell ordinarily produces polyketides, it may 
be desirable to modify the host so as to prevent the production of endogenous 
polyketides prior to its use to express a recombinant PKS of the invention. Such 
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modified hosts include S. coelicolor CH999 and similarly modified £ lividans 
described in U.S. Patent No. 5,672,491, and PCT publication Nos. WO 95/08548 
and WO 96/40968, incorporated herein by reference. In such hosts, it may not be 
necessary to provide enzymatic activities for all of the desired post-translational 
5 modifications of the enzymes that make up the recombinantly produced PKS, 
because the host naturally expresses such enzymes. In particular, these hosts 
generally contain holo- ACP synthases that provide the phosphopantotheinyl 
residue needed for functionality of the PKS. 

The invention provides a wide variety of expression vectors for use in 
10 Streptomyces. The replicating expression vectors of the present invention include, 
for example and without limitation, those that comprise an origin of replication 
from a low copy number vector, such as SCP2* (see Hopwood et aL, Genetic 
Manipulation of Streptomyces: A Laboratory manual (The John Innes Foundation, 
Norwich, U.K., 1985); Lydiate et aL, 1985, Gene 35: 223-235; and Kieser and 
1 5 Melton, 1988, Gene 65: 83-91 , each of which is incorporated herein by reference), 
SLP1.2 (Thompson et aL, 1982, Gene 20: 51-62, incorporated herein by 
reference), and pSG5(ts) (Muth et aL, 1989, Mol Gen. Genet 219: 341-348, and 
Bierman et aL, 1992, Gene 116: 43-49, each of which is incorporated herein by 
reference), or a high copy number vector, such as pIJlOl and pJVl (see Katz et 
20 aL, 1983,7. Gen. Microbiol. 129: 2703-2714; Vara et aL, 1989,./ Bacteriol. 171: 
5782-5781; and Servin-Gonzalez, 1993, Plasmid 30: 131-140, each of which is 
incorporated herein by reference). For non-replicating and integrating vectors and 
generally for any vector, it is useful to include at least an E. coli origin of 
replication, such as from pUC, plP, pi I, and pBR. For phage based vectors, the 
25 phage phiC3 1 and its derivative KC5 1 5 can be employed (see Hopwood et aL, 
supra). Also, plasmid pSET152, plasmid pSAM, plasmids pSElOl and pSE21 1, 
all of which integrate site-specifically in the chromosomal DNA of S. lividans, can 
be employed for purposes of the present invention. 

The Streptomyces recombinant expression vectors of the invention 
30 typically comprise one or more selectable markers, including antibiotic resistance 
conferring genes selected from the group consisting of the ermE (confers 
resistance to erythromycin and lincomycin), tsr (confers resistance to 
thiostrepton), aadA (confers resistance to spectinomycin and streptomycin), aacC4 
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(confers resistance to apramycin, kanamycin, gentamicin, geneticin (G418), and 
neomycin), hyg (confers resistance to hygromycin), and vph (confers resistance to 
viomycin) resistance conferring genes. Alternatively, several polyketides are 
naturally colored, and this characteristic can provide a built-in marker for 
5 identifying cells. 

Megalomicins are currently produced only by the relatively genetically 
intractable host Micromonospora megalomicinea. This bacteria has not been 
commonly used in the fermentation industry for the large-scale production of 
antibiotics, and methods for high level production of megalomicin and its analogs 

10 are needed. In contrast, the streptomycete bacteria have been widely used for 
almost 50 years and are excellent hosts for production of megalomicin and its 
analogs. Streptomyces lividans and S. coelicolor have been developed for the 
expression of heterologous PKS systems. These organisms can stably maintain 
cloned heterologous PKS genes, express them at high levels under controlled 

15 conditions, and modify the corresponding PKS proteins (e.g., 

phosphopantotheinylation) so that they are capable of production of the polyketide 
they encode. Furthermore, these hosts contain the necessary pathways to produce 
the substrates required for polyketide synthesis; e.g. propionyl-CoA and 
methylmalonyl-CoA. A wide variety of cloning and expression vectors are 

20 available for these hosts, as are methods for the introduction and stable 

maintenance of large segments of foreign DNA. Relative to Micromonospora spp., 
S. lividans and S. coelicolor grow well on a number of media and have been 
adapted for high level production of polyketides in fermentors. If production levels 
* are low, a number of rational approaches are available to improve yield (see 

25 Hosted and Baltz, 1 996, Trends BiotechnoL 7 4(7):245-50/incorporated herein by 
reference). Empirical methods to increase the titers of these macrolides, long since 
proven effective for numerous bacterial polyketides, can also be employed. 

Preferred Streptomyces host cell/vector combinations of the invention 
include S. coelicolor CH999 and S. lividans K4-1 14 host cells, which have been 

30 modified so as not to produce the polyketide actinorhodin, and expression vectors 
derived from the pRMl and pRM5 vectors, as described in U.S. Patent Nos. 
5,830,750 and 6,022,731 and U.S. patent application Serial No. 09/181,833, filed 
28 Oct. 1998, each of which is incorporated herein by reference. These vectors are 
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particularly preferred in that they contain promoters compatible with numerous 
and diverse Sireptomyces spp~ Particularly useful promoters for Strepiomyces host 
cells include those from PKS gene clusters, that result in the production of 
polyketides as secondary metabolites, including promoters from aromatic (Type II) 
5 PKS gene clusters. Examples of Type II PKS gene cluster promoters are act gene 
promoters and tcm gene promoters; an example of a Type I PKS gene cluster 
promoter are the promoters of the spiramycin PKS genes and DEBS genes. The 
present invention also provides the megalomicin biosynthetic gene promoters in 
recombinant form. These promoters can be used to drive expression of the 
1 0 megalomicin biosynthetic genes or any other coding sequence of interest in host 
cells in which the promoter functions, particularly Micromonospora megalomicea 
and generally any Sireptomyces species. 

As described above, particularly useful control sequences are those that 
alone or together with suitable regulatory systems activate expression during 
15 transition from growth to stationary phase in the vegetative mycelium. The 
promoter contained in the aforementioned plasmid pRM5, i.e., the actl/actHI 
promoter pair and the actII-ORF4 activator gene, is particularly preferred. Other 
useful Strepiomyces promoters include without limitation those from the ermE 
gene and the melCJ gene, which act constitutively, and the tipA gene and the merA 
20 gene, which can be induced at any growth stage. In addition, the T7 RNA 

polymerase system has been transferred to Strepiomyces and can be employed in 
the vectors and host cells of the invention. In this system, the coding sequence for 
the T7 RNA polymerase is inserted into a neutral site of the chromosome or in a 
vector under the control of the inducible merA promoter, and the gene of interest is 
25 placed under the control of the 11 promoter. As noted above, one or more 
activator genes can also be employed to enhance the activity of a promoter. 
Activator genes in addition to the actll-ORF4 gene described above include dnrl, 
redD, and ptpA genes (see U.S. patent application Serial No. 09/181,833, supra). 
To provide a preferred host cell and vector for purposes of the invention, 
30 the megalomicin biosynthetic genes are placed on a recombinant expression vector 
and transferred to the non-macrolide producing hosts Sireptomyces lividans K4- 
1 14 and S. coelicolor CH999. Transformation of S. lividans K4-1 14 or S. 
coelicolor CH999 with this expression vector results in a strain which produces 
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detectable amounts of megalomicin as determined by analysis of extracts by 
LC/MS. As noted above, the present invention also provides recombinant DNA 
compounds in which the encoded megalomicin module 1 KS domain is 
inactivated (the KS1° mutation). The introduction into Strepiomyce's lividahs or S. 
5 coelicolor of a recombinant expression vector of the invention that encodes a 
megalomicin PKS with a KSl° domain produces a host cell useful for making 
polyketides by a process known as diketide feeding. The resulting host cells can be 
fed or supplied withN-acylcysteamine thioesters of precursor molecules to 
prepare megalomicin derivatives. Such cells of the invention are especially useful 

1 0 in the production of 1 3-substituted-6-deoxyerythronolide B compounds in 
recombinant host cells. Preferred compounds of the invention include those 
compounds in which the substituent at the 13-position is propyl, vinyl, propargyl, 
other lower alkyl, and substituted alkyl. In a preferred embodiment, the meg PKS 
is produced from a recombinant construct in which the megAJII gene has been 

15 altered to abolish the regions of identical coding sequence it otherwise shares with 
the megAI gene, or a hybrid PKS is employed in which the megAIII gene product 
has been replaced by the oleAHI gene product. Recombinant oleAIII genes are 
described in, for example, PCT patent publication No. 00/026349 and U.S. patent 
application Serial No. 09/428,5 17, filed 28 Oct. 1999, both of which are 

20 incorporated herein by reference. 

The recombinant host cells of the invention can express all of the 
megalomicin biosynthetic genes or only a subset of the same. For example, if only 
the genes for the megalomicin PKS are expressed in a host cell that otherwise does 
not produce polyketide modifying enzymes that can act on the polyketide 

25 produced, then the host cell produces unmodified polyketides, called macrolide 
aglycones. Such macrolide aglycones can be hydroxylated and glycosylated by 
adding them to the fermentation of a strain such as, for example, Streptomyces 
antibioticus or Saccharopolyspora erythraea, that contains the requisite 
modification enzymes. 

30 There are a wide variety of diverse organisms that can modify macrolide 

aglycones to provide compounds with, or that can be readily modified to have, 
useful activities. For example, as shown in Figure 5, Saccharopolyspora eryihraea 
can convert 6-dEB to a variety of useful compounds. The erythronolide 6-dEB is 
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converted by the eryF gene product to erythronolide B, which is, in turn, 
glycosylated by the eryBV gene product to obtain 3-O-mycarosylerythronolide B, 
which contains L-mycarose at C-3. The eryCIII gene product then converts this 
compound to erythromycin D by glycosylation with D-desosamine at C-5. 
5 Erythromycin D, therefore, differs from 6-dEB through glycosylation and by the 
addition of a hydroxyl group at C-6. Erythromycin D can be converted to 
erythromycin B in a reaction catalyzed by the eryG gene product by methylating 
the L-mycarose residue at C-3. Erythromcyin D is converted to erythromycin C by 
the addition of a hydroxyl group at C- 12 in a reaction catalyzed by the eryK gene 
10 product. Erythromycin A is obtained from erythromycin C by methylation of the 
mycarose residue in a reaction catalyzed by the eryG gen$ product. The 
unmodified megalomicin compounds provided by the present invention, such as, 
for example, the 6-dEB or 6-dEB analogs, produced in Streptomyces lividans, can 
be provided to cultures of S. erythraea and converted to the corresponding 
1 5 derivatives of erythromycins A, B, C, and D in accordance with the procedure 
provided in the examples below. To ensure that only the desired compound is 
produced, one can use an S. erythraea eryA mutant that is unable to produce 6- 
dEB but can still carry out the desired conversions (Weber et aL, 1 985, J. 
Bacteriol. J64(\): 425-433). Also, one can employ other mutant strains, such as 
20 eryB, eryC, eryG, and/or eryK mutants, or mutant strains having mutations in 

multiple genes, to accumulate a preferred compound. The conversion can also be 
carried out in large fermentors for commercial production. 

Moreover, there are other useful organisms that can be employed to 
hydroxylate and/or glycosylate the compounds of the invention. As described 
25 above, the organisms can be mutants unable to produce the polyketide normally 

produced in that organism, the fermentation can be carried out on plates or in large 
fermentors, and the compounds produced can be chemically altered after 
fermentation. Thus, Streptomyces venezuelae, which produces picromycin, 
contains enzymes that can transfer a desosaminyl group to the C-5 hydroxyl and a 
30 hydroxyl group to the C-12 position. In addition, 5. venezuelae contains a 

glucosylation activity that glucosylates the T -hydroxyl group of the desosamine 
sugar. This latter modification reduces antibiotic activity, but the glucosyl residue 
is removed by enzymatic action prior to release of the polyketide from the cell. 
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Another organism, S. narbonensis, contains the same modification enzymes as S. 
venezitelae, except the C-12 hydroxylase. Thus, the present invention provides the 
compounds produced by hydroxylation and glycosylation of the macrolide 
aglycones of the invention by action of the enzymes endogenous to S. narbonensis 
5 and S. venezuelae. 

Other organisms suitable for making compounds of the invention include 
Micromonospora megalomicea (discussed above), Streptomyces antibioticus, S. 
fradiae, and 5. thermotolerans. S. antibioticus produces oleandomycin and 
contains enzymes that hydroxylate the C-6 and C-12 positions, glycosylate the C-3 

10 hydroxyl with oleandrose and the C-5 hydroxyl with desosamine, and form an 
epoxide at C-8-C-8a. S, fradiae contains enzymes that glycosylate the C-5 
hydroxyl with mycaminose and then the 4 5 -hydroxyl of mycaminose with 
mycarose, forming a disaccharide. S. thermotolerans contains the same activities 
as S. fradiae, as well as acylation activities. Thus, the present invention provides 

15 the compounds produced by hydroxylation and glycosylation of the macrolide 

aglycones of the invention by action of the enzymes endogenous to S. antibioticus , 
S. fradiae, and S. thermotolerans. 

The present invention also provides methods and genetic constructs for 
producing the glycosylated and/or hydroxylated compounds of the invention 

20 directly in the host cell of interest. Thus, the recombinant genes of the invention, 
which include recombinant megAl, megAII, and megAIII genes with one or more 
deletions and/or insertions, including replacements of a megA gene fragment with 
a gene fragment from a heterologous PKS gene (as discussed in the next Section), 
can be included on expression vectors suitable for expression of the encoded gene 

25 products in Saccharopolyspora erythraea, Streptomyces antibioticus, S. 

venezuelae, S. narbonensis, Micromonospora megalomicea, S. fradiae, and S. 
thermotolerans. 

A number of erythromycin high-producing strains of Saccharopolyspora 
erythraea and Streptomyces fradiae have been developed, and in a preferred 
30 embodiment, the megalomicin PKS and/or other megalomicin biosynthetic genes 
are introduced into such strains (or erythromycin non-producing mutants thereof) 
to provide the corresponding modified megalomicin compounds in high yields. 
Those of skill in the art will appreciate that S. erythraea contains the desosamine 
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and mycarose biosynthetic and transfer genes as well as DEBS, which, as noted 
above, makes the same macrolide aglycone, 6-dEB, as the megalomicin PICS. 5. 
erylhraea does not make megosamine or its corresponding transferase gene, and 
does not contain the acylation gene of Micromonospora megalomicea. Finally, the 
5 S. erythraea eryG gene product converts mycarose to cladinose, which does not 
occur in M. megaiomicea. Thus, the present invention provides a wide variety of 
S. erythraea recombinant host cells, including, for example, those that contain: 

(i) wild-type erythromycin biosynthetic genes with recombinant 
megosamine biosynthetic and transfer genes, with and without megalomicin 

10 acylation genes; 

(ii) wild-type erythromycin biosynthetic genes except eryG, with 
recombinant megosamine biosynthetic and transfer genes, with and without 
megalomicin acylation genes; and 

(iii) as in (i) and (ii), except that the eryA genes are inactive or deleted and 
1 5 recombinant megA genes have been introduced. 

The invention provides other S. erythraea strains as well, including those 
in which any one or more of the erythromycin biosynthetic genes have been 
deleted or otherwise rendered inactive and in which at least one megalomicin 
biosynthetic gene has been introduced. 
20 For example, the present invention enables one to express the megosamine 

genes in a Saccharopolyspora erythraea eryG mutant in which the erythromycin C 
made by this mutant is converted to megalomicin A. Alternatively, one could use 
an erythromycin C high -producing strain of S. erythraea in biotransformation 
methods in which the erythromycin C is fed to &Streptomyces lividans strain 
25 carrying only the megosamine biosynthesis and glycosyl transferase genes. As 
another alternative, one could use a strain of S. lividans that carries suitable 
erythromycin production genes along with the daunosamine biosynthesis genes 
" plus geneX and geneY of Figure 5, or all of the megosamine biosynthesis genes, to 
produce megalomicin A. 
30 All or some of the megalomicin gene cluster can be easily cloned under 

control of a suitable promoter in pCK7 or pSETl 52 either in one or two plasmids 
and introduced into the Saccharopolyspora erythraea eryG mutant. The actll- 
ORF4/ac7/p system and the phiC3 Mint system in pSET function well in this 
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organism (see Rowe et aL 9 1998, Gene, 216:215-23, incorporated herein by 
reference). Alternatively, the megosamine biosynthesis genes are introduced into 
Streptomyces lividans on the same plasmids and the production of megalomicin A 
or its precursor mediated by byconversion, done by feeding erythronolide B, 3- 
5 alpha-mycarosylerythxonolide B, erythromycin D or erythromycin C to the S. 
lividans strain. 

Lack of adequate resistance to megalomicin A in S. erylhraea or 5. 
lividans is not expected, because both organisms have MLS resistance genes 
(ermE and mgt/lrm, respectively), which confer resistance to several 14-membered 

10 macrolides (see Cundliffe, 1989, Annu. Rev, Microbiol. 43:207-33; Jenkins and 
Cundliffe, 1991, Gene 705:55-62; and Cundliffe, 1992, Gene, 775:75-84, each of 
which is incorporated herein by reference). One can also readily determine the 
level of resistance of the 5. erythraea eryG mutant and the S. lividans host cells to 
megalomicin A, both in plate tests and in liquid medium. One can repeat the 

15 bioconversion method using an eryG mutant of a high erythromycin A producing 
5*. erythraea strain (or an eryB or eryC mutant, as necessary) to determine the level 
at which megalomicin A can be produced. Furthermore, if experience shows that 
high level megalomicin A production requires a higher level of resistance to this 
macrolide than present in S. erythraea or S. lividans, the necessary megalomicin 

20 self-resistance genes will be cloned from M. megalomicea and moved into either 
one of the heterologous hosts. This will be straightforward work since self- 
resistance genes are usually found in the cluster of macrolide biosynthesis genes 
and can be identified by their homology to known macrolide resistance genes 
and(or) by the resistance phenotype they impart to a strain that normally is 

25 sensitive. 

Alternatively, geneXand geneY (Figure 5) can be added to cassettes 
containing the relevant daunosamine (dnm) biosynthesis genes (Figure 5) to 
provide the ability to make TDP-megosamine in vivo and attach it to an 
erythromycin algycone. The TDP-daunosamine biosynthesis genes can be re- 
30 cloned from Streptomyces peucetius on two compatible and mutually selectable 
plasmids. When an S. lividans strain containing these two plasmids and the dnmS 
gene for TDP-daunosamine glycosyltransferase is grown in the presence of added 
epsilon-rhodomycinone, its glycoside with L-daunosamine, called rhodomycin D, 
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is produced in good yield. Thus, bioconversion of one of the erythromycins to 
megalomicin A should be observed when geneX and gene Y are present. One can 
construct all five combination - the two //-dimethyl transferase genes and the three 
glycosyltransferase genes - to discriminate geneX and geneY from those connected 
5 with mycarose and desosamine biosynthesis and attachment in the megalomicin 
pathway. 

Because the timing of megosamine addition is unknown, one can test 
erythronolide B, 3-alpha-mycarosylerythronolide B, erythromycin D and 
erythromycin C as substrates provided to a strain that expresses the megosamine 
10 biosynthetic and transferase genes. There is need to test the C3" ' and(or) C4' 
acytated metabolites like megalomicin CI, because these metabolites are made 
from megalomicin A and not the converse,- based on the precedents in the 
biosynthesis of tylosin (see Arisawa a/., 1994, Appl Environ. Microbiol 60: 
2657-2661), carbomycin (see Epp etal, 1989, Gene #J:293-301), and 
1 5 midecamycin (see Hara and Hutchinson, 1 992, J. Bacterid 1 74, 5 1 4 1 -5 1 44). If 
C-6 glycosylation of erythronolide B or 3-alpha-mycarosylerythronolide B (Figure 
5) happens before addition of desosamine to C-5, then the erythromycin genes 
might not be able to complete formation of megalomicin A from some mono or 
diglycoside if the erythromycin glycosyltransferases cannot tolerate a C-6 
20 glycoside. Although unexpected, such an outcome could be circumvented in 
accordance with the methods of the invention by cloning further megalomicin 
biosynthesis genes into the appropriate S. erythraea background or into S. lividans 
- specifically, the necessary deoxysugar biosynthesis and attachment genes - to 
create a recombinant strain that produces megalomicin A. 
25 The acyltransferase gene that adds acetate or propionate to the C3"' or 

C4"' positions of mycarose in megalomicin B, CI and C2 (Figure 3) is contained 
within the cosmids of the invention and can be identified by scanning the sequence 
data for the megalomicin gene cluster to locate homologs of car E and mdmB or 
their acyA homologs from the tylosin producer. The carE and acyA genes govern 
30 C4'" acylation in the carbomycin and tylosin pathway, respectively. The * 

megalomicin homolog has the equivalent function in megalomicin biosynthesis 
(but is specific for C3'" and C4'" acylation). The gene can be cloned under 
control of a suitable promoter and introduced into S. lividans to produce the 
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desired acyl derivative of megalomicin A. Alternatively, introduction of the carE 
gene can form megalomicin B. This gene can be cloned from the carbomycin, 
spiramycin or tylosin producers. 

If the amount of megalomicin produced by an S. erythraea or 5. lividans or 
5 other recombinant host ceil is less than desired, yield can be improved by 
optimizing the growth medium and fermentation conditions, by increasing 
expression of the gene(s) that appear to be rate limiting, based on the level of 
pathway intermediates that are accumulated by the recombinant strain constructed, 
and by reconstructing the ery, dnm y and megalomicin biosynthesis genes on 

10 vectors like pSET152 that can be integrated into the genome to provide a stabler 
recombinant strain for strain improvement. 

In another embodiment, the present invention provides recombinant 
vectors encoding one or more of the megosamine, desosamine, and mycarose 
biosynthetic and transfer genes and heterologous host cells comprising those 

15 vectors. In this embodiment of the invention, the heterologous host cell is typically 
a cell that is unable to produce the sugar and transfer it to a polyketide unless the 
vector of the invention is introduced. For example, neither Streptomyces lividans 
nor S. coelicolor is naturally capable of making megosamine, desosamine, or 
mycarose or transferring those moieties to a polyketide. However, the present 

20 invention provides recombinant Streptomyces lividans and S. coelicolor host cells 
that are capable of making megosamine, desosamine, and/or mycarose and 
transferring those moieties to a polyketide. 

Moreover, additional recombinant gene products can be expressed in the 
host cell to improve production of a desired polyketide. As but one non-limiting 

25 example, certain of the recombinant PKS proteins of the invention may produce a 
polyketide other than or in addition to the predicted polyketide, because the 
polyketide is cleaved from the PKS by the thioesterase (TE) domain in module 6 
prior to processing by other domains on the PKS, in particular, any KR, DH, 
and/or ER domains in module 6. The production of the predicted polyketide can 

30 be increased in such instances by deleting the TE domain coding sequences from 
the gene and, optionally, expressing the TE domain as a separate protein. See 
Gokhale et aL> Feb. 1 999, "Mechanism and specificity of the terminal thioesterase 
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domain from the erythromycin polyketide synthase," Chem, & Biol. 6: 1 17-125, 
incorporated herein by reference. 

Thus, in one important aspect, the present invention provides methods, 
expression vectors, and recombinant host cells that enable the production of 
5 megalomicin and hydroxylated and glycosylated derivatives of megalomicin in 
heterologous host cells. The present invention also provides methods for making a 
wide variety of polyketides derived in part from the megalomicin PKS or other 
biosynthetic genes, as described in the following Section. 

10 Section VI: Hybrid PKS Genes 

The present invention provides recombinant DNA compounds encoding 
each of the domains of each of the modules of the megalomicin PKS as well as the 
other megalomicin biosynthetic enzymes. The availability of these compounds 
permits their use in recombinant procedures for production of desired portions of 
15 the megalomicin PKS fused to or expressed in conjunction with all or a portion of 
a heterologous PKS and, optionally, one or more polyketide modification 
enzymes. These compounds also permit the modification of polyketides with the 
various megalomicin modification enzymes. The resulting hybrid PKS can then be 
expressed in a host cell to produce a desired polyketide or modified form thereof 
20 Thus, in accordance with the methods of the invention, a portion of the 

megalomicin biosynthetic gene coding sequence that encodes a particular activity 
can be isolated and manipulated, for example, to replace the corresponding region 
in a different modular PKS gene or modification enzyme gene. In addition, coding 
sequences for individual proteins, modules, domains, and portions thereof of the 
25 megalomicin PKS can be ligated into suitable expression systems and used to 

produce the portion of the protein encoded. The resulting protein can be isolated 
arid purified or can may be employed in situ to effect polyketide synthesis. 
Depending on the host for the recombinant production of the domain, module, 
protein, or combination of proteins, suitable control sequences such as promoters, 
30 termination sequences, enhancers, and the like are ligated to the nucleotide 

sequence encoding the desired protein in the construction of the expression vector, 
as described above. 
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In one important embodiment, the invention thus provides hybrid PKS 
enzymes and the corresponding recombinant DNA compounds that encode those 
hybrid PKS enzymes. For purposes of the invention, a hybrid PKS is a 
recombinant PKS that comprises all or part of one or more extender modules, 
5 loading module, and/or thioesterase/cyclase domain of a first PKS and all or part 
of one or more extender modules, loading module, and/or thioesterase/cyclase 
domain of a second PKS. In one preferred embodiment, the first PKS is most but 
not ail of the megalomicin PKS, and the second PKS is only a portion of a non- 
megalomicin PKS. An illustrative example of such a hybrid PKS includes a 

10 megalomicin PKS in which the megalomicin PKS loading module has been 

replaced with a loading module of another PKS. Another example of such a hybrid 
PKS is a megalomicin PKS in which the AT domain of extender module 3 is 
replaced with an AT domain that binds only malonyl CoA. In another preferred 
embodiment, the first PKS is most but not all of a non-megalomicin PKS, and the 

1 5 second PKS is only a portion of the megalomicin PKS. An illustrative example of 
such a hybrid PKS includes a rapamycin PKS in which an AT specific for malonyl 
CoA is replaced with the AT from the megalomicin PKS specific for 
methylmalonyl CoA. Other illustrative hybrid PKSs of the invention are described 
below. 

20 Those of skill in the art will recognize that all or part of either the first or 

second PKS in a hybrid PKS of the invention need not be isolated from a naturally 
occurring source. For example, only a small portion of an AT domain determines 
its specificity. See PCT patent application No. WO US99/15047, and Lau et al. y 
infra, incorporated herein by reference. The state of the art in DNA synthesis 

25 allows the artisan to construct de novo DNA compounds of size sufficient to 
construct a useful portion of a PKS module or domain. Thus, the desired 
derivative coding sequences can be synthesized using standard solid phase 
synthesis methods such as those described by Jaye et ah, 1 984, J. Biol Chem. 259: 
6331, and instruments for automated synthesis are available commercially from, 

30 for example, Applied Biosystems, Inc. For purposes of the invention, such 
synthetic DNA compounds are deemed to be a portion of a PKS. 

With this general background regarding hybrid PKSs of the invention, one 
can better appreciate the benefit provided by the DNA compounds of the invention 
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that encode the individual domains, modules, and proteins that comprise the 
megalomiciri PKS. As described above, the megalomicin PKS is comprised of a 
loading module, six extender modules composed of a KS, AT, ACP, and zero, 
one, two, or three ICR, DH, and ER domains, and a thioesterase domain, the DNA 
5 compounds of the invention that encode these domains individually or in 
combination are useful in the construction of the hybrid PKS encoding DNA 
compounds of the invention. For example, a DNA compound of the invention that 
encodes an extender module or portion of an extender module is useful in the 
construction of a coding sequence that encodes a protein subcomponent of a PKS. 
10 The DNA compound of the invention that comprises a coding sequence of a PKS 
subunit protein is useful in the construction of an expression vector that drives 
expression of the subunit in a host cell that expresses the other subunits and so 
produces a functional PKS. 

The recombinant DNA compounds of the invention that encode the 
15 loading module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS 
loading module is inserted into a DNA compound that comprises the coding 
sequence for one or more heterologous PKS extender modules. The resulting 
20 construct, in which the coding sequence for the loading module of the 

heterologous PKS is replaced by that for the coding sequence of the megalomicin 
PKS loading module provides a novel PKS. Examples include the DEBS, 
rapamycin, FK-506, FK-520, rifamycin, and avermectin PKS coding sequences. In 
another embodiment, a DNA compound comprising a sequence that encodes the 
25 megalomicin PKS loading module is inserted into a DNA compound that 
comprises the coding sequence for the megalomicin PKS or a recombinant 
megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion of the loading module coding sequence 
is utilized in conjuction with a heterologous coding sequence. In this embodiment, 
30 the invention provides, for example, replacing the methylmalonyl CoA (propionyl) 
specific AT with a malonyl CoA (acetyl), ethylmalonyl CoA (butyryl), or other 
CoA specific AT. In addition, the AT and/or ACP can be replaced by another AT 
and/or another ACP or an inactivated KS, such as a KS Q , an AT, and/or another 
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ACP. The resulting heterologous loading module coding sequence can be utilized 
in conjunction with a coding sequence for a PKS that synthesizes megalomicin, a 
megalomicin derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the first 
5 extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS first 
extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 

1 0 sequence for a module of the heterologous PKS is either replaced by that for the 
first extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for modules of the heterologous PKS, provides a novel PKS 
coding sequence. In another embodiment, a DNA compound comprising a 
sequence that encodes the first extender module of the megalomicin PKS is 

1 5 inserted into a DNA compound that comprises coding sequences for the 
megalomicin PKS or a recombinant megalomicin PKS that produces a 
megalomicin derivative. 

In another embodiment, a portion or all of the first extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 

20 hybrid module. In this embodiment, the invention provides, for example, replacing 
the methylmaionyl CoA specific AT with a malonyl CoA, ethyimalonyl CoA, or 
2-hydroxymalonyl CoA specific AT; deleting (which includes inactivating) the 
KR; inserting a DH or a DH and ER; and/or replacing the KR with another KR, a 
DH and KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be 

25 replaced with another KS and/or ACP. In each of these replacements or insertions, 
the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 
from a coding sequence for another module of the megalomicin PKS ? from a gene 
for a PKS that produces a polyketide other than megalomicin, or from chemical 
synthesis. The resulting heterologous first extender module coding sequence can 

30 be utilized in conjunction with a coding sequence for a PKS that synthesizes 
megalomicin, a megalomicin derivative, or another polyketide. 

Those of skill in the art will recognize, however, that deletion of the KR 
domain of extender module 1 or insertion of a DH domain or DH and KR domains 
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into extender module 1 will prevent the typical cyclization of the polyketide at the 
hydroxyl group created by the ICR if such hybrid module is employed as a first 
extender module in a hybrid PKS or is otherwise involved in producing a portion 
of the polyketide at which cyclization is to occur. Such deletions or insertions can 
5 be useful, however, to create linear molecules or to induce cyclization at another 
site in the molecule. 

As noted above, the invention also provides recombinant PKSs and 
recombinant DNA compounds and vectors that encode such PKSs in which the 
KS domain of the first extender module has been inactivated. Such constructs are 
10 typically expressed in translational reading frame with the first two extender 
modules on a single protein, with the remaining modules jand domains of a 
megalomicin, megalomicin derivative, or hybrid PKS expressed as one or more, 
typically two, proteins to form the multi-protein functional PKS. The utility of 
these constructs is that host cells expressing, or cell free extracts containing, the 
1 5 PKS encoded thereby can be fed or supplied with N-acylcysteamine thioesters of 
precursor molecules to prepare megalomicin derivative compounds. See U.S. 
patent application Serial No. 09/492,733, filed 27 Jan. 2000, and PCT publication 
Nos. WO 00/44717, 99/03986 and 97/02358, each of which is incorporated herein 
by reference. 

20 The recombinant DNA compounds of the invention that encode the second 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS 
second extender module is inserted into a DNA compound that comprises the 

25 coding sequence for a heterologous PKS. The resulting construct, in which the 
coding sequence for a module of the heterologous PKS is either replaced by that 
for the second extender module of the megalomicin PKS or the latter is merely 
added to coding sequences for the modules of the heterologous PKS, provides a 
novel PKS. In another embodiment, a DNA compound comprising a sequence that 

30 encodes the second extender module of the megalomicin PKS is inserted into a 

DNA compound that comprises the coding sequences for the megalomicin PKS or 
a recombinant megalomicin PKS that produces a megalomicin derivative. 

54 



BNSDOCID: <WO 0127284A3JA> 



WO 01/27284 PCT/US00/27433 

In another embodiment, a portion or all of the second extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
5 CoA, or 2-hydroxymalonyl CoA specific AT; deleting (or inactivating) the KR; 
replacing the KR with a KR, a KR and a DH, or a KR, DH, and ER; and/or 
inserting a DH or a DH and an ER. In addition, the KS and/or ACP can be 
replaced with another KS and/or ACP. In each of these replacements or insertions, 
the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 

1 0 from a coding sequence for another module of the megalomicin PKS, from a 

coding sequence for a PKS that produces a polyketide otter than megalomicin, or 
from chemical synthesis. The resulting heterologous second extender module 
coding sequence can be utilized in conjunction with a coding sequence from a 
PKS that synthesizes megalomicin, a megalomicin derivative, or another 

15 polyketide. 

The recombinant DNA compounds of the invention that encode the third 
extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS third 

20 extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
third extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS: 

25 In another embodiment, a DNA compound comprising a sequence that encodes 
the third extender module of the megalomicin PKS is inserted into a DNA 
compound that comprises coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the third extender module 

30 coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting the inactive KR; and/or 
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replacing the KR with an active KR, or a ICR and DH, or a ICR, DH, and ER. In 
addition, the KS and/or ACP can be replaced with another ICS and/or ACP. in 
each of these replacements or insertions, the heterologous KS, AT, DH, ICR, ER, 
or ACP coding sequence can originate from a coding sequence for another module 
5 of the megalomicin PKS, from a gene for a PKS that produces a polyketide other 
than megalomicin, or from chemical synthesis. The resulting heterologous third 
extender module coding sequence can be utilized in conjunction with a coding 
sequence for a PKS that synthesizes megalomicin, a megalomicin derivative, or 
another polyketide. 

10 The recombinant DNA compounds of the invention that encode the fourth 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS fourth 
extender module is inserted into a DNA compound that comprises the coding , 
1 5 sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
fourth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 
20 the fourth extender module of the megalomicin PKS is inserted into a DNA 
compound that comprises coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion of the fourth extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 
25 hybrid module. In this embodiment, the invention provides, for example, replacing 
the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 
2-hydroxymalonyl CoA specific AT; deleting or inactivating any one, two, or all 
three of the ER, DH^ and KR; and/or replacing any one, two, or all three of the ER, 
DH, and KR with either a KR, a DH and KR, or a KR, DH, and ER. In addition, 
30 the KS and/or ACP can be replaced with another KS and/or ACP. In each of these 
replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding 
sequence can originate from a coding sequence for another module of the 
megalomicin PKS (except for the DH and ER domains), from a coding sequence 
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for a PICS that produces a polyketide other than megalomicin, or from chemical 
synthesis. The resulting heterologous fourth extender module coding sequence can 
be utilized in conjunction with a coding sequence for a PKS that synthesizes 
megalomicin, a megalomicin derivative, or another polyketide. 
5 The recombinant DN A compounds of the invention that encode the fifth 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS fifth 
extender module is inserted into a DNA compound that comprises the coding 

10 sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
fifth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 

1 5 the fifth extender module of the megalomicin PKS is inserted into a DNA 

compound that comprises the coding sequence for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the fifth extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 

20 create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl Co A, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting (or inactivating) the KR; 
inserting a DH or a DH and ER; and/or replacing the KR with another KR, a DH 
and KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be replaced 

25 with another KS and/or ACP. In each of these replacements or insertions, the 

heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a 
coding sequence for another module of the megalomicin PKS, from a coding 
sequence for a PKS that produces a polyketide other than megalomicin, or from 
chemical synthesis. The resulting heterologous fifth extender module coding 

30 sequence can be utilized in conjunction with a coding sequence for a PKS that 
synthesizes megalomicin, a megalomicin derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the sixth 
extender module of the megalomicin PKS and the corresponding polypeptides 
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encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS sixth 
extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 
5 sequence for a module of the heterologous PKS is either replaced by that for the 
sixth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 
the sixth extender module of the megalomicin PKS is inserted into a DNA 
10 compound that comprises the coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the sixth extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 
1 5 replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting or inactivating the KR or 
replacing the KR with another KR, a KR and DH,- or a KR, DH, and an ER; and/or 
inserting a DH or a DH and ER. In addition, the KS and7or ACP can be replaced 
with another KS and/or ACP. In each of these replacements or insertions, the 
20 heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a 
coding sequence for another module of the megalomicin PKS, from a coding 
sequence for a PKS that produces a polyketide other than megalomicin, or from 
chemical synthesis. The resulting heterologous sixth extender module coding 
sequence can be utilized in conjunction with a coding sequence for a PKS that 
25 synthesizes megalomicin, a megalomicin derivative, or another polyketide. 

The sixth extender module of the megalomicin PKS is followed by a 
thioesterase domain. This domain is important in the cyclization of the polyketide 
and its cleavage from the PKS. The present invention provides recombinant DNA 
compounds that encode hybrid PKS enzymes in which the megalomicin PKS is 
30 fused to a heterologous thioesterase or a heterologous PKS is fused to the 

megalomicin PKS thioesterase. Thus, for example, a thioesterase domain coding 
sequence from another PKS gene can be inserted at the end of the sixth (or other 
final) extender module coding sequence in recombinant DNA compounds of the 
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invention or the megalomicin PKS thioesterase can be similarly fused to a 
heterologous PKS. Recombinant DNA compounds encoding this thioesterase 
domain are useful in constructing DNA compounds that encode the megalomicin 
PKS, a PKS that produces a megalomicin derivative, and a PKS that produces a 
5 polyketide other than megalomicin or a megalomicin derivative. 

Thus, the hybrid modulies of the invention are incorporated into a PKS to 
provide a hybrid PKS of the invention. A hybrid PKS of the invention can result 
not only; 

(i) from fusions of heterologous domain (where heterologous means the 
10 domains in a module are derived from at least two different naturally occurring 

* 

modules) coding sequences to produce a hybrid module coding sequence 
contained in a PKS gene whose product is incorporated into a PKS, 
but also: 

(ii) from fusions of heterologous modules (where heterologous module 
1 5 means two modules are adjacent to one another that are not adjacent to one 

another in naturally occurring PKS enzymes) coding sequences to produce a 
hybrid coding sequence contained in a PKS gene whose product is incorporated 
into a PKS, 

(iii) from expression of one or more megalomicin PKS genes with one or 
20 more non-megalomicin PKS genes, including both naturally occurring and 

recombinant non-megalomicin PKS genes, and 

(iv) from combinations of the foregoing. 

Various hybrid PKSs of the invention illustrating these various alternatives are 
described herein. 

25 An example of a hybrid PKS comprising fused modules results from 

fusion of the loading module of either the DEBS PKS or the narbonolide PKS (see 
PCT patent application No. US99/1 1814, incorporated herein by reference) with 
extender modules 1 and 2 of the megalomicin PKS to produce a hybrid rriegAI 
gene. Co-expression of either one of these two hybrid megAI genes with the 

30 megAIJ and megAf/I genes in suitable host cells, such as Streptomcyes lividans, 
results in expression of a hybrid PKS of the invention that produces 6- 
deoxyerythronolide B (the polyketide product of the natural megA genes) in 
recombinant host cells. Co-expression of either one of these two hybrid megAI 
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genes with the eryAII and eryAIII genes similarly results in the production of 6- 
dEB, while co-expression with the analogous narbonolide PKS genes, picAII, 
picAIII and picAIV, results in the production of 3-deoxy-3-oxo-6-dEB (3-keto-6- 
dEB), useful in the production of ketolides, compounds with potent anti-bacterial 
5 activity. 

Another example of a hybrid PKS comprising a hybrid module is prepared 

* 

by co-expressing the megAI and megAII genes with a megAIII hybrid gene 
encoding extender module 5 and the KS and AT of extender module 6 of the 
megalomicin PKS fused to the ACP of module 6 and the TE of the narbonolide 
1 0 PKS. The resulting hybrid PKS of the invention produces 3-keto-6-dEB. This 

compound can also be prepared by a recombinant megalomicin derivative PKS of 
the invention in which the KR domain of module 6 of the megalomicin PKS has 
been deleted. Moreover, the invention provides hybrid PKSs in which not only the 
above changes have been made but also the AT domain of module 6 has been 
1 5 replaced with a malonyl-specific AT. These hybrid PKSs produce 2-desrnethyl-3- 
deoxy-3-oxo-6-dEB, a useful intermediate in the preparation of 2-desmethyl 
ketolides, compounds with potent antibiotic activity. 

Another illustrative example of a hybrid PKS includes the hybrid PKS of 
the invention resulting only from the latter change in the hybrid PKS just 
20 described. Thus, co-expression of the megAI and megAII genes with a hybrid 
megAIII gene in which the AT domain of module 6 has been replaced by a 
malonyl-specific AT results in the expression of a hybrid PKS that produces 2- 
desmethyl-6-dEB in recombinant host cells. This compound is a useful 
intermediate for making 2-desmethyl erythromycins in recombinant host cells of 
25 the invention, as well as for making 2-desmethyl semi-synthetic ketolides. 

While many of the hybrid PKSs described above are composed primarily 
of megalomicin PKS proteins, those of skill in the art recognize that the present 
invention provides many different hybrid PKSs, including those composed of only 
a small portion of the megalomicin PKS. For example, the present invention 
30 provides a hybrid PKS in which a hybrid eryAI gene that encodes the megalomicin 
PKS loading module fused to extender modules I and 2 of DEBS is coexpressed 
with the eryAII and eryAIII genes. The resulting hybrid PKS produces 6-dEB, the 
product of the native DEBS. When the construct is expressed in 
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Saccharopolyspora erythraea host cells (either via chromosomal integration in the 
chromosome or via a vector that encodes the hybrid PKS), the resulting 
recombinant host cell of the invention produces erythromycins. Another 
illustrative example is the hybrid PKS of the invention composed of the megAI 
5 and eryAII and etyAIII gene products. This construct is also useful in expressing 
erythromycins in Saccharopolyspora erythraea host cells. In a preferred 
embodiment, the S. erythraea host cells are eryAJ mutants that do not produce 6- 
deoxyerythronolide B. 

Another example is the hybrid PKS of the invention composed of the 

10 products of the picAl and picAJI genes (the two proteins that comprise the loading 
module and extender modules 1 - 4, inclusive, of the narbonolide PKS) and the 
megAIII gene. The resulting hybrid PKS produces the macrolide aglycone 3- 
hydroxy-narbonolide in Streptomyces lividans host cells and the corresponding 
erythromycins in Saccharopolyspora erythraea host cells. 

15 Each of the foregoing hybrid PKS enzymes of the invention, and the hybrid 

PKS enzymes of the invention generally, can be expressed in a host cell that also 
expresses a functional oleP gene product. The oleP gene encodes an oleandomycin 
modification enzyme, and expression of the gene together with a hybrid PKS of 
the invention provides the compounds of the invention in which a C-8 hydroxy 1, a 

20 C-8a or C-8-C-8a epoxide is present. 

Recombinant methods for manipulating modular PKS genes to make 
hybrid PKS enzymes are described in U.S. Patent Nos. 5,672,49 1 ; 5,843,71 8; 
5,830,750; and 5,712,146; and in PCT publication Nos. 98/49315 and 97/02358, 
each of which is incorporated herein by reference. A number of genetic 

25 engineering strategies have been used with DEBS to demonstrate that the 

structures of polyketides can be manipulated to produce novel natural products, 
primarily analogs of the erythromycins (see the patent publications referenced 
supra and Hutchinson, 1998, Curr Opin Microbiol, J:3 19-329, and Baltz, 1998, 
Trends Microbiol. 6:76-83, incorporated herein by reference). Because of the 

30 similar activity of the megalomicin PKS and DEBS (both PKS enzymes produce 
the macrolide aglycone 6-dEB), these methods can be readily applied to the 
recombinant megalomicin PKS genes of the invention. 



61 



BMSOOCID: <WO 0 127284 A3 J 4> 



WO 01/27284 PCT/US00/27433 

These techniques include: (i) deletion or insertion of modules to control 
chain length, (ii) inactivation of reduction/dehydration domains to bypass beta- 
carbon processing steps, (iii) substitution of AT domains to alter starter and 
extender units, (iv) addition of reduction/dehydration domains to introduce 
5 catalytic activities, and (v) substitution of ketoreductase KR domains to control 
hydroxyl stereochemistry. In addition, engineered blocked mutants of DEBS have 
been used for precursor directed biosynthesis of analogs that incorporate 
synthetically derived starter units. For example, more than 100 novel polyketides 
were produced by engineering single and combinatorial changes in multiple 
10 modules of DEBS. Hybrid PKS enzymes based on DEBS with up to three catalytic 
domain substitutions were constructed by cassette mutagenesis, in which various 
DEBS domains were replaced with domains from the rapamycin PKS (see 
Schweke et al, 1995, Proc. Nat Acad. Set USA 92, 7839-7843, incorporated 
herein by reference) or one more of the DEBS KR domains was deleted. 
15 Functional single domain replacements or deletions were combined to generate 
DEBS enzymes with double and triple catalytic domain substitutions (see 
McDaniel et aL, 1999, Proc, Nat Acad. Set USA 96, 1846-1851, incorporated 
herein by reference). By providing the analogous megaiomicin/rapamycin hybrid 
PKS enzymes, the present invention provides alternative means to make these 
20 polyketides. 

Methods for generating libraries of polyketides have been greatly improved 
by cloning PKS genes as a set of three or more mutually selectable plasmids, each 
carrying a different wild-type or mutant PKS gene, then introducing all possible 
combinations of the plasmids with wild-type, mutant, and hybrid PKS coding 

25 sequences into the same host (see U.S. patent application Serial No. 60/129,73 1 , 
filed 16 Apr. 1999, and PCT Pub. No. 98/27203, each of which is incorporated 
herein by reference). This method can also incorporate the use of a KS1° mutant, 
which by mutational biosynthesis can produce polyketides made from diketide 
starter units (see Jacobsen et aL, 1997, Science 277, 367-369, incorporated herein 

30 by reference), as well as the use of a truncated gene that leads to 12-membered 
macrolides or an elongated gene that leads to 16-membered ketolides. Moreover, 
by utilizing in addition one or more vectors that encode glycosyl biosynthesis and 
transfer genes, such as those of the present invention for megosamine, 
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desosamine, oleandrose, cladinose, and/or mycarose (in any combination), a large 
collection of glycosylated poJyketides can be prepared. 

The following Table lists references describing illustrative PKS genes and 
corresponding enzymes that can be utilized in the construction of the recombinant 
5 hybrid PKSs and the corresponding DNA compounds that encode them of the 
invention. Also presented are various references describing tailoring enzymes and 
corresponding genes that can be employed in accordance with the methods of the 
invention. 
Avermectin 
10 U.S. Pat. No. 5,252,474 to Merck. 

MacNeil et al , 1 993 , Industrial Microorganisms: Basic and Applied 
Molecular Genetics. Bal tz, Hegeman, & Skatrud, eds. (ASM), pp. 245-256, A 
Comparison of the Genes Encoding the Polyketide Synthases for Avermectin, 
Erythromycin, and Nemadectin. 
15 MacNeil et al. 9 1992, Gene J 1 5: 1 19-125, Complex Organization of the 

Streptomyces avermitilis genes encoding the avermectin polyketide synthase. 
Candicidin (FR008) 

Hu etaL, 1994, Mol Microbiol 14: 163-172. 
Epothilone 

20 PCT Pub. No. 00/03 1247 to Kosan. 

Erythromycin 

PCT Pub. No. 93/1 3663 to Abbott. 
US Pat. No. 5,824,5 1 3 to Abbott. 
Donadio et al, 1991, Science 252:675-9. 
25 Cortes et al, 8 Nov. 1 990, Nature 348: 1 76-8, An unusually large 

multifunctional polypeptide in the erythromycin producing polyketide synthase of 
Saccharopolyspora erythraea. 
Glycosvlation Enzymes 
PCT Pub. No. 97/23630 to Abbott. 
30 FK-506 

Motamedi et al. y 1998, The biosynthetic gene cluster for the macrolactone 
ring of the immunosuppressant FK506, Eur. J. biochem. 256: 528-534. 
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Motamedi et aL, 1997, Structural organization of a multifunctional 
polyketide synthase involved in the biosynthesis of the macrolide 
immunosuppressant FK506, Eur. J. Biochem. 244: 74-80. 

Methyl transferase 

5 US 5,264,355, issued 23 Nov. 1993, Methylating enzyme from 

Streptomyces MA6858..31-O-desmethyl-FK506 methyltransferase. 

Motamedi et aL, 1 996, Characterization of methyltransferase and 

hydroxylase genes involved in the biosynthesis of the immunosuppressants FK506 

and FK520, J. BacterioL 178: 5243-5248. 

10 FK-520 

PCT Pub. No. 00/20601 to Kosan. 

See also Nielsen et aL, 1991, Biochem. 30:5789-96 (enzymology of 
pipecolate incorporation). 
Lovastatin 

15 U.S. Pat. No. 5,744,350 to Merck. 

Narbomycin (and Picromycin) 

PCT Pub. No. WO US99/61599 to Kosan. 
Nemadectin 

MacNeil et al, 1993, supra. 

20 Niddamycin 

Kakavas et aL, 1997, Identification and characterization of the niddamycin 
polyketide synthase genes from Streptomyces caelestis, ./ Bacterid. 1 79: 75 1 5- 
7522. 

Oleandomycin 

25 Swan et aL, 1994, Characterization of a Streptomyces antibioticus gene 

encoding a type I polyketide synthase which has an unusual coding sequence, Mol 
Gen. Genet. 242: 358-362. 

PCT Pub. No. 00/026349 to Kosan. 

Olano et aL, 1998, Analysis of a Streptomyces antibioticus chromosomal 
30 region involved in oleandomycin biosynthesis, which encodes two 

glycosyltransferases responsible for glycosylation of the macrolactone ring, MoL 

Gen. Genet. 259(3): 299-308. 

Platenolide 
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EP Pub. No. 791,656 to Lilly. 
Rapamycin 

Schwecke et aL, Aug. 1 995, The biosynthetic gene cluster for the 
polyketide rapamycin, Proc. Nail. Acad. ScL USA P2:7839-7843. 
5 Aparicio et aL, 1 996, Organization of the biosynthetic gene cluster for 

rapamycin in Streptomyces hygroscopicus: analysis of the enzymatic domains in 
the modular polyketide synthase, Gene 169: 9-16. 
Rifamycin 

August et aL 9 13 Feb. 1998, Biosynthesis of the ansamycin antibiotic 
10 rifamycin: deductions from the molecular analysis of the r//*biosynthetic gene 
cluster of Amycolatopsis mediterranei S669, Chemistry &• Biology, 5(2): 69-79. 
Soraphen 

U.S. Pat. No. 5,716,849 to Novartis. 

Schupp et al, 1995, J. Bacteriology 1 77: 3673-3679. A Sorangium 
1 5 cellulosnm (Mycobacterium) Gene Cluster for the Biosynthesis of the Macrolide 
Antibiotic Soraphen A: Cloning, Characterization, and Homology to Polyketide 
Synthase Genes from Actinomycetes. 
Spiramycin 

U.S. Pat. No, 5,098,837 to Lilly. 
20 Activator Gene 

U.S. Pat. No. 5,514,544 to Lilly. 
Tylosin 

EP Pub. No. 791,655 to Lilly. 

Kuhstoss et a/., 1996, Gene 755:231-6., Production of a novel polyketide 

25 through the construction of a hybrid polyketide synthase. 

U.S. Pat. No. 5,876,991 to Lilly. 
Tailoring enzymes 

Merson-Davies and Cundliffe, 1994, Mol Microbiol 13: 349-355. 
Analysis of five tylosin biosynthetic genes from the tylBA region of the 
30 Streptomyces fradiae genome. 

As the above Table illustrates, there are a wide variety of PKS genes that serve as 
readily available sources of DNA and sequence information for use in constructing 
the hybrid PKS-encoding DNA compounds of the invention. 
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In constructing hybrid PKSs of the invention, certain general methods may 
be helpful. For example, it is often beneficial to retain the framework of the 
module to be altered to make the hybrid PKS. Thus, if one desires to add DH and 
ER functionalities to a module, it is often preferred to replace the KR domain of 
5 the original module with a cognate KR, DH, and ER domain-containing segment 
from another module, instead of merely inserting DH and ER domains. One can 
alter the stereochemical specificity of a module by replacement of the KS domain 
with a KS domain from a module that specifies a different stereochemistry. See 
Lau et aL, 1999, "Dissecting the role of acyltransferase domains of modular 
10 polyketide synthases in the choice and stereochemical fate of extender units" 

Biochemistry 38(5): 1643-1 651, incorporated herein by reference. One can alter the 
specificity of an AT domain by changing only a small segment of the domain. See 
Lau et aL, supra. One can also take advantage of known linker regions in PKS 
proteins to link modules from two different PKSs to create a hybrid PKS. See 
15 Gokhale et aL> 16 Apr. 1999, Dissecting and Exploiting Intermodular 

Communication in Polyketide Synthases", Science 284: 482-485, incorporated 
herein by reference. 

The hybrid PKS-encoding DNA compounds of the invention can be and 
often are hybrids of more than two PKS genes. Even where only two genes are 
20 used, there are often two or more modules in the hybrid gene in which all or part 
of the module is derived from a second (or third) PKS gene. Thus, as one 
illustrative example, the invention provides a hybrid PKS that contains the 
naturally occurring loading module and thioesterase domain as well as extender 
modules one, two, four, and six of the megalomicin PKS and further contains 
25 hybrid or heterologous extender modules three and five. Hybrid or heterologous 
extender modules three and five contain AT domains specific for malonyl Co A 
and derived from, for example, the rapamycin PKS genes. 

The invention also provides libraries of PKS genes, PKS proteins, and 
ultimately, of polyketides, that are constructed by generating modifications in the 
30 megalomicin PKS so that the protein complexes produced have altered activities 
in one or more respects and thus produce polyketides other than the natural 
product of the PKS, Novel polyketides may thus be prepared, or polyketides in 
general prepared more readily, using this method. By providing a large number of 
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different genes or gene clusters derived from a naturally occurring PKS gene 
cluster, each of which has been modified in a different way from the native cluster, 
an effectively combinatorial library of polyketides can be produced as a result of 
the multiple variations in these activities. As will be further described below, the 
5 metes and bounds of this embodiment of the invention can be described on the 
polyketide, protein, and the encoding nucleotide sequence levels. 

* 

As described above, a modular PKS "derived from 1 ' the megalomicin or 
other naturally occurring PKS includes a modular PKS (or its corresponding 
encoding gene(s)) that retains the scaffolding of the utilized portion of the 

1 0 naturally occurring gene. Not all modules need be included in the constructs; 

however, the constructs can also comprise more than six modules. On the constant 
scaffold, at least one enzymatic activity is mutated, deleted, replaced, or inserted 
so as to alter the activity of the resulting PKS relative to the original (native) PKS. 
Alteration results when these activities are deleted or are replaced by a different 

1 5 version of the activity, or simply mutated in such a way that a polyketide other 
than the natural product results from these collective activities. This occurs 
because there has been a resulting alteration of the starter unit and/or extender 
unit, stereochemistry, chain length or cyclization, and/or reductive or dehydration 
cycle outcome at a corresponding position in the product polyketide. Where a 

20 deleted activity is replaced, the origin of the replacement activity may come from a 
corresponding activity in a different naturally occurring PKS or from a different 
region of the megalomicin PKS. Any or all of the megalomicin PKS genes may be 
included in the derivative or portions of any of these may be included, but the 
scaffolding of a functional PKS protein is retained in whatever derivative is 

25 constructed. The derivative preferably contains a thioesterase activity from the 
megalomicin or another PKS. 

Thus, a PKS derived from the megalomicin PKS includes a PKS that 
contains the scaffolding of all or a portion of the megalomicin PKS. The derived 
PKS also contains at least two extender modules that are functional, preferably 

30 three extender modules, and more preferably four or more extender modules, and 
most preferably six extender modules. The derived PKS also contains mutations, 
deletions, insertions, or replacements of one or more of the activities of the 
functional modules of the megalomicin PKS so that the nature of the resulting 
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polyketide is altered at both the protein and DNA sequence levels. Particular 
preferred embodiments include those wherein a ICS, AT, or ACP domain has been 
deleted or replaced by a version of the activity from a different PKS or from 
another location within the same PKS. Also preferred are derivatives where at 
5 least one non-condensation cycle enzymatic activity (KR, DH, or ER) has been 
deleted or added or wherein any of these activities has been mutated so as to 
change the structure of the polyketide synthesized by the PKS. 

Conversely, also included within the definition of a PKS derived from the 
megalomicin PKS are functional non-megalomicin PKS modules or their 
10 encoding genes wherein at least one domain or coding sequence therefor of a 
megalomicin PKS module has been inserted. Exemplary is the use of the 
megalomicin AT for extender module 2, which accepts a methylmalonyl CoA 
extender unit rather than malonyl CoA, to replace a malonyl specific AT in 
another PKS. Other examples include insertion of portions of non-condensation 
1 5 cycle enzymatic activities or other regions of megalomicin synthase activity into a 
heterologous PKS at both the DNA and protein levels. 

Thus, there are at least five degrees of freedom for constructing a hybrid 
PKS in terms of the polyketide that will be produced. First, the polyketide chain 
length is determined by the number of extender modules in the PKS, and the 
20 present invention includes hybrid PKSs that contain 6, as wells as fewer or more 
than 6, extender modules. Second, the nature of the carbon skeleton of the PKS is 
determined by the specificities of the acyl transferases that determine the nature of 
the extender units at each position, e.g., malonyl, methylmalonyl, ethylmalonyl, or 
other substituted malonyl. Third, the loading module specificity also has an effect 
25 on the resulting carbon skeleton of the polyketide. The loading module may use a 
different starter unit, such as acetyl, butyryl, and the like. As noted above, another 
method for varying loading module specificity involves inactivating the KS 
activity in extender module 1 (KS1) and providing alternative substrates, called 
diketides, that are chemically synthesized analogs of extender module 1 diketide 
30 products, for extender module 2. This approach was illustrated in PCT publication 
Nos. 97/02358 and 99/03986, incorporated herein by reference, wherein the KS1 
activity was inactivated through mutation. Fourth, the oxidation state at various 
positions of the polyketide will be determined by the dehydratase and reductase 
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portions of the modules. This will determine the presence and location of ketone 
and alcohol moieties and C-C double bonds or C-C single bonds in the polyketide. 

Finally, the stereochemistry of the resulting polyketide is a function of 
three aspects of the synthase. The first aspect is related to the AT/KS specificity 
5 associated with substituted malonyls as extender units, which affects 

stereochemistry only when the reductive cycle is missing or when it contains only 
a ketoreductase, as the dehydratase would abolish chirality. Second, the specificity 
of the ketoreductase may determine the chirality of any beta-OH. Finally, the 
enoylreductase specificity for substituted malonyls as extender units may influence 

10 the stereochemistry when there is a complete KR7DH/ER available. 

Thus, the modular PKS systems generally and the rnegalomicin PKS 
system particularly permit a wide range of polyketides to be synthesized. As 
compared to the aromatic PKS systems, the modular PKS systems accept a wider 
range of starter units, including aliphatic monomers (acetyl, propionyl, butyryl, 

1 5 isovaleryl, and the like.), aromatics (aminohydroxybenzoyl), alicyclics 

(cyclohexanoyl), and heterocyclics (thiazolyl). Certain modular PKSs have relaxed 
specificity for their starter units (Kao et aL, 1994, Science, supra). Modular PKSs 
also exhibit considerable variety with regard to the choice of extender units in 
each condensation cycle. The degree of beta-ketoreduction following a 

20 condensation reaction can be altered by genetic manipulation (Donadio et al, 

1991, Science, supra; Donadio et al, 1993, Proc. Natl. Acad Sci. USA 90: 71 19- 
7123). Likewise, the size of the polyketide product can be varied by designing 
mutants with the appropriate number of modules (Kao et al. 9 1994, J. Am. Chem. 
Soc. JJ6A 1612-1 1613). Lastly, modular PKS enzymes are particularly well 

25 known for generating an impressive range of asymmetric centers in their products 
in a highly controlled manner. The polyketides, antibiotics, and other compounds 
produced by the methods of the invention are typically single stereoisomer^ 
forms. Although the compounds of the invention can occur as mixtures of 
stereoisomers, it may be beneficial in some instances to generate individual 

30 stereoisomers. Thus, the combinatorial potential within modular PKS pathways 
based on any naturally occurring modular, such as the rnegalomicin, PKS scaffold 
is virtually unlimited. 
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While hybrid PKSs are most often produced by "mixing and matching" 
portions of PKS coding sequences, mutations in DNA encoding a PKS can also be 
used to introduce, alter, or delete an activity in the encoded polypeptide. Mutations 
can be made to the native sequences using conventional techniques. The substrates 
5 for mutation can be an entire cluster of genes or only one or two of them; the 
substrate for mutation may also be portions of one or more of these genes. 
Techniques for mutation include preparing synthetic oligonucleotides including 
the mutations and inserting the mutated sequence into the gene encoding a PKS 
subunit using restriction endonuclease digestion. See, e.g., Kunkel, 1985, Proc. 
10 Nail. Acad ScL USA 82: 448; Geisselsoder et ah, 1987, Biofechniqves 5:786. 

Alternatively, the mutations can be effected using a mismatched primer (generally 
10-20 nucleotides in length) that hybridizes to the native nucleotide sequence, at a 
temperature below the melting temperature of the mismatched duplex. The primer 
can be made specific by keeping primer length and base composition within 
1 5 relatively narrow limits and by keeping the mutant base centrally located. See 

Zoller and Smith, 1983, Methods Enzymol 700:468. Primer extension is effected 
using DNA polymerase, the product cloned, and clones containing the mutated 
DNA, derived by segregation of the primer extended strand, selected. 
Identification can be accomplished using the mutant primer as a hybridization 
20 probe. The technique is also applicable for generating multiple point mutations. 
See, e.g., Dalbie-McFarland et al. y 1982, Proc. Nad Acad. Sci. USA 79: 6409. 
PCR mutagenesis can also be used to effect the desired mutations. 

Random mutagenesis of selected portions of the nucleotide sequences 
encoding enzymatic activities can also be accomplished by several different 
25 techniques known in the art, e.g., by inserting an oligonucleotide linker randomly 
into a plasmid, by irradiation with X-rays or ultraviolet light, by incorporating 
incorrect nucleotides during in vitro DNA synthesis, by error-prone PCR 
mutagenesis, by preparing synthetic mutants, or by damaging plasmid DNA in 
vitro with chemicals, in accordance with the methods of the present invention. 
30 Chemical mutagens include, for example, sodium bisulfite, nitrous acid, 

nitrosoguanidine, hydroxylamine, agents which damage or remove bases thereby 
preventing normal base-pairing such as hydrazine or formic acid, analogues of 
nucleotide precursors such as 5-bromouracil, 2-aminopurine, or acridine 
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intercalating agents such as proflavine, acriflavine, quinacrine, and the like. 
Generally, plasmid DNA or DNA fragments are treated with chemical mutagens, 
transformed into E. colt and propagated as a pool or library of mutant plasmids. 

In constructing a hybrid PKS of the invention, regions encoding enzymatic 
5 activity, i.e., regions encoding corresponding activities from different PKS 
synthases or from different locations in the same PKS, can be recovered, for 
example, using PCR techniques with appropriate primers. By "corresponding" 
activity encoding regions is meant those regions encoding the same general type of 
activity. For example, a KR activity encoded at one location of a gene cluster 
10 "corresponds" to a KR encoding activity in another location in the gene cluster or 
in a different gene cluster. Similarly, a complete reductase cycle could be 
considered corresponding. For example, KR/DH/ER can correspond to a KR 
alone. 

If replacement of a particular target region in a host PKS is to be made, 

15 this replacement can be conducted in vitro using suitable restriction enzymes. The 
replacement can also be effected in vivo using recombinant techniques involving 
homologous sequences framing the replacement gene in a donor plasmid and a 
receptor region in a recipient plasmid. Such systems, advantageously involving 
plasmids of differing temperature sensitivities are described, for example, in PCT 

20 publication No. WO 96/40968, incorporated herein by reference. The vectors used 
to perform the various operations to replace the enzymatic activity in the host PKS 
genes or to support mutations in these regions of the host PKS genes can be 
chosen to contain control sequences operably linked to the resulting coding 
sequences in a manner such that expression of the coding sequences can be 

25 effected in an appropriate host. 

However, simple cloning vectors may be used as well. If the cloning 
vectors employed to obtain PKS genes encoding derived PKS lack control 
sequences for expression operably linked to the encoding nucleotide sequences, 
the nucleotide sequences are inserted into appropriate expression vectors. This 

30 need not be done individually, but a pool of isolated encoding nucleotide 
sequences can be inserted into expression vectors, the resulting vectors 
transformed or transfected into host cells, and the resulting cells plated out into 
individual colonies. The invention provides a variety of recombinant DNA 
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compounds in which the various coding sequences for the domains and modules 
of the megalomicin PKS are flanked by non-natural ly occurring restriction enzyme 
recognition sites. 

The various PKS nucleotide sequences can be cloned into one or more 
5 recombinant vectors as individual cassettes, with separate control elements, or 
under the control of, e.g., a single promoter. The PKS subunit encoding regions 
can include flanking restriction sites to allow for the easy deletion and insertion of 
other PKS subunit encoding sequences so that hybrid PKSs can be generated. The 
design of such unique restriction sites is known to those of skill in the art and can 
10 be accomplished using the techniques described above, such as site-directed 

mutagenesis and PCR. 

The expression vectors containing nucleotide sequences encoding a variety 
of PKS enzymes for the production of different polyketides are then transformed 
into the appropriate host cells to construct the library. In one straightforward 
15 approach, a mixture of such vectors is transformed into the selected host cells and 
the resulting cells plated into individual colonies and selected to identify 
successful transformants. Each individual colony has the ability to produce a 
particular PKS synthase and ultimately a particular polyketide. Typically, there 
will be duplications in some, most, or all of the colonies; the subset of the 
20 transformed colonies that contains a different PKS in each member colony can be 
considered the library. Alternatively, the expression vectors can be used 
individually to transform hosts, which transformed hosts are then assembled into a 
library. A variety of strategies are available to obtain a multiplicity of colonies 
each containing a PKS gene cluster derived from the naturally occurring host gene 
25 cluster so that each colony in the library produces a different PKS and ultimately a 
different polyketide. The number of different polyketides that are produced by the 
library is typically at least four, more typically at least ten, and preferably at least 
20, and more preferably at least 50, reflecting similar numbers of different altered 
PKS gene clusters and PKS gene products. The number of members in the library 
30 is arbitrarily chosen; however, the degrees of freedom outlined above with respect 
to the variation of starter, extender units, stereochemistry, oxidation state, and 
chain length enables the production of quite large libraries. 
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Methods for introducing the recombinant vectors of the invention into 
suitable hosts are known to those of skill in the art and typically include the use of 
CaCl 2 or agents such as other divalent cations, lipofection, DMSO, protoplast 
transformation, conjugation, infection, transfection, and electroporation. The 
5 polyketide producing colonies can be identified and isolated using known 

techniques and the produced polyketides further characterized. The polyketides 
produced by these colonies can be used collectively in a panel to represent a 
library or may be assessed individually for activity. 

* 

The libraries of the invention can thus be considered at four levels: (1) a 

1 0 multiplicity of colonies each with a different PKS encoding sequence; (2) the 

proteins produced from the coding sequences; (3) the polyketides produced from 
the proteins assembled into a functional PKS; and (4) antibiotics or compounds 
with other desired activities derived from the polyketides. Of course, combination 
libraries can also be constructed wherein members of a library derived, for 

15 example, from the megalomicin PKS can be considered as a part of the same 
library as those derived from, for example, the rapamycin PKS or DEBS. 

Colonies in the library are induced to produce the relevant synthases and 
thus to produce the relevant polyketides to obtain a library of polyketides. The 
polyketides secreted into the media can be screened for binding to desired targets, 

20 such as receptors, signaling proteins, and the like. The supematants per se can be 
used for screening, or partial or complete purification of the polyketides can first 
be effected. Typically, such screening methods involve detecting the binding of 
each member of the library to receptor or other target ligand; Binding can be 
detected either directly or through a competition assay. Means to screen such 

25 libraries for binding are well known in the art and can be applied in accordance 
with the methods of the present invention. Alternatively, individual polyketide 
members of the library can be tested against a desired target. In this event, screens 
wherein the biological response of the target is measured can more readily be 
included. Antibiotic activity can be verified using typical screening assays such as 

30 those set forth in Lehrer et al, 1991, J. Immunol Meth. 757:167-173, incorporated 
herein by reference, and in the Examples below. 

The invention provides methods for the preparation of a large number of 
polyketides. These polyketides are useful intermediates in formation of 
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compounds with antibiotic or other activity through hydroxylation, epoxidation, 

and glycosylation reactions as described above. In general, the polyketide products 
of the PKS must be further modified, typically by hydroxylation and glycosylation, 
to exhibit potent antibiotic activity. Hydroxylation results in the novel polyketides 
5 of the invention that contain hydroxyl groups at C-6, which can be accomplished 
using the hydroxylase encoded by the eryF gene, and/or C-12, which can be 
accomplished using the hydroxylase encoded by the picK or eryK gene. Also, the 
oleP gene is available in recombinant form, which can be used to express the oleP 
gene product in any host cell. A host cell, such as a Streptomyces host cell or a 
1 0 Saccharopolyspora erythraea host cell, modified to express the oleP gene thus can 
be used to produce polyketides comprising the C-8-C-8a epoxide present in 
oleandomycin. Thus the invention provides such modified polyketides. The 
presence of hydroxyl groups at these positions can enhance the antibiotic activity 
of the resulting compound relative to its unhydroxylated counterpart. 
1 5 Methods for glycosylating polyketides are generally known in the art and 

can be applied in accordance with the methods of the present invention; the 
glycosylation may be effected intracellular^ by providing the appropriate 
glycosylation enzymes or may be effected in vitro using chemical synthetic means 
as described herein and in PCT publication No. WO 98/49315, incorporated 
20 herein by reference. Preferably, glycosylation with desosamine, mycarose, and/or 
megosamine is effected in accordance with the methods of the invention in 
recombinant host cells provided by the invention. In general, the approaches to 
effecting glycosylation mirror those described above with respect to 
hydroxylation. The purified enzymes, isolated from native sources or 
25 recombinantly produced may be used in vitro. Alternatively and as noted, 

glycosylation may be effected intracellularly using endogenous or recombinantly 
produced intracellular glycosylases. In addition, synthetic chemical methods may 
be employed. 

The antibiotic modular polyketides may contain any of a number of 
30 different sugars, although D-desosamine, or a close analog thereof, is most 

common. Erythromycin, picromycin, megalomicin, narbomycin, and methymycin 
contain desosamine. Erythromycin also contains L-cladinose (3-O-methyl 
mycarose). Tylosin contains mycaminose (4-hydroxy desosamine), mycarose and 

74 



BNSDOCID: <WO 0127284A3_IA> 



WO 01/27284 PCT/US00/27433 

6-deoxy-D-allose. 2-acetyl-l-bromodesosamine has been used as a donor to 
glycosylate polyketides by Masamune et ai 7 1975, J. Am. Chem. Soc, 97: 3512- 
3513. Other, apparently more stable donors include glycosyl fluorides, 
thioglycosides, and trichloroacetimidates; see Woodward et aL, 1981,7. Am. 
5 Chem. Soc. 103: 3215; Martin ei aL, 1997, J. Am. Chem. Soc. 119: 3193; Toshima 
et aL, 1995, J. Am. Chem. Soc. 117: 3717; Matsumoto et aL, 1988, Tetrahedron 
Lett. 29: 3575. Glycosylation can also be effected using the polyketide aglycones 
as starting materials and using Saccharopolyspora erythraea or Streptomyces 
venezuelae or other host cell to make the conversion, preferably using mutants 
1 0 unable to synthesize macrolides, as discussed in the preceding Section. 

Thus, a wide variety of polyketides can be produced by the hybrid PKS 
enzymes of the invention. These polyketides are useful as antibiotics and as 
intermediates in the synthesis of other useful compounds, as described in the 
following section. 

15 

Section VII: Host Cells Containing Multiple Expression Vectors 

A recombinant host cell of the invention may contain nucleic acid 
encoding a megalomicin PKS domain, module, or protein, or megalomicin 
modification enzyme at a single genetic locus, e.g., on a single plasmid or at a 

20 single chromosomal locus, or at different genetic loci, e.g., on separate plasmids 
and/or chromosomal loci. By "multiple" is meant two or more; by "vector" is 
meant a nucleic acid molecule which can be used to transform host systems and 
which contains an independent expression system containing a coding sequence 
under control of a promoter and optionally a selectable marker and any other 

25 suitable sequences regulating expression. Typical such vectors are plasmids, but 
other vectors such as phagemids, cosmids, viral vectors and the like can be used 
according to the nature of the host. Of course, one or more of the separate vectors 
may integrate into the chromosome of the host (selection may not be required for 
maintenance of integrated vectors). 

30 I n one embodiment, the invention provides a recombinant host cell, which 

comprises at least two separate autonomously replicating recombinant DN A 
expression vectors, each of said vectors comprises a recombinant DNA compound 
encoding a megalomicin PKS domain or a megalomicin modification enzyme 
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operably linked to a promoter. In another embodiment, the invention provides a 
recombinant host cell, which comprises at least one autonomously replicating 
recombinant DNA expression vector and at least one modified chromosome, each 
of said vector(s) and each of said modified chromosome comprises a recombinant 
5 DNA compound encoding a megalomicin PKS domain or a megalomicin 

modification enzyme operably linked to a promoter. Preferably, the autonomously 
replicating recombinant DNA expression vector and/or the modified chromosome 
further comprises distinct selectable markers. 

The above multiple-vector (chromosome) expression systems can also be 
1 0 used for expressing heterogeneous polyketide biosynthetic enzymes, e.g. , for 

expressing Micromonospora megalomicea megalomicin PKS protein, module, or 
domain or a megalomicin modification enzyme with a PKS protein, module, or 
domain, or modification enzyme from other origins in the same host cells. By 
placing various activities on different expression vectors, a high degree of 
15 variation can be achieved in an efficient manner. A variety of hosts can be used; 
any suitable host cell that can maintain multiple vectors can readily be used. 
Preferred hosts include Streptomyces, yeast, E. coli, other actinomycetes, and plant 
cells, and mammalian or insect cells or other suitable recombinant hosts can also 
be used. Preferred among yeast strains are Saccharomyces cerevisiae and Pichia 
20 pastoris. Preferred actinomycetes include various strains of Streptomyces. 

If one chooses to use a host cell that does not naturally produce a 
polyketide, then one may need to ensure that the recombinant host is modified to 
also contain a holo ACP synthase activity that effects pantetheinylation of the acyl 
carrier protein. See PCT Pub. No. WO 97/13845, incorporated herein by 
25 reference. One of the multiple vectors may be used for this purpose. This 

activation step is necessary for activation of the ACP. The expression system for 
the holo ACP 1 synthase may be supplied on a vector separate from that carrying a 
PKS coding sequence or may be supplied on the same vector or may be integrated 
into the chromosome of the host, or may be supplied as an expression system for a 
30 fusion protein with all or a portion of a polyketide synthase (see U.S. Patent No. 
6,033,883, incorporated herein by reference). 

It should be noted that in some recombinant hosts, it may also be necessary 
to activate the polyketides produced through postsynthesis modifications when 
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polyketides having such modifications are desired. If this is the case for a 
particular host, the host will be modified, for example by transformation, to 
contain those enzymes necessary for effecting these modifications. Among such 
enzymes, for example, are glycosylation enzymes. The use of multiple vectors can 

5 facilitate the introduction of expression systems for such enzymes. 

In a preferred embodiment, the multiple vector system is used to assemble 
rapidly and efficiently a combinatorial library of polyketides and the 
PKS/modification enzymes that produce them. In an illustrative embodiment, the 
multiple vector system comprises four different vectors, one comprising the megAI 

1 0 gene, one the megAII gene, one the megAI II gene, and one the modification 

enzyme(s) gene(s). Each of these vectors can be modified to make a set of vectors. 
For example, one set could contain all possible AT substitutions in the loading and 
first and second extender modules of the megAI gene product. Another set could 
contain expression systems for a variety of different modification enzymes. With 

1 5 these four vectors sets and by combining each member of each set with each 
member of the other three sets, a very large library of cells, vector sets, and 
polyketides can be rapidly and efficiently assembled. 

The combinatorial potential of a modular PKS such as the megalomicin 
PKS (ignoring the additional potential of different modification enzyme systems) 

20 is minimally given by: ATl X (ATe X 4) M where AT L is the number of loading 
acyl transferases, ATe is the number of extender acyl transferases, and M is the 
number of modules in the gene cluster. The number 4 is present in the formula 
because this represents the number of ways a keto group can be modified by either 
]) no reaction; 2) KR activity alone; 3) KR+DH activity; or 4) KR+DH+ER 

25 activity. It has been shown that expression of only the first two modules of the 
erythromycin PKS resulted in the production of a predicted truncated triketide 
product (See Kaoet al., J. Am. Chem. Soc. y JJ6:1 1612-1 1613 ((1994)). A novel 
12-membered macrolide similar to methymycin aglycone was produced by 
expression of modules 1-5 of this PKS in S. coelicolor (See Kao et al., ,/. Am. 

30 Cham, Soc, H7:9 105-9 106 (1995)). This work shows that PKS modules are 

functionally independent so that lactone ring size can be controlled by the number 
of modules present. 
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In addition to controlling the number of modules, the modules can be 
genetically modified, for example, by the deletion of a ketoreductase domain as 
described by Donadio et ah, Science, 252:675-679 (1991); and Donadio et al., 
Gene, 115:97-103 (1992). In addition, the mutation of an enoyl reductase domain 
5 was reported by Donadio, et al., Proc. Natl Acad. ScL 9 90:71 19-7 123 (1993). 

These modifications also resulted in modified PKS and thus modified polyketides. 

As stated above, in the present invention, the coding sequences for 
catalytic activities derived from the megalomicin PKS systems found in nature can 
be used in their native forms or modified by standard mutagenesis techniques to 
10 delete or diminish activity or to introduce an activity into a module in which it was 
not originally present. For example, a KR activity can be introduced into a 
module normally lacking that function. 

In one embodiment of the invention herein, a single host cell is modified to 
contain a multiplicity of vectors, each vector contributing a portion of the 
15 synthesis of a megalomicin PKS and modification enzyme (if any) system. Each 
of the multiple vectors for production of the megalomicin PKS system typically 
encodes at least two modules, and at least one of the vectors integrates into the 
chromosome of the host. Integration can be effected using suitable phage or 
integrating vectors or by homologous recombination. If homologous 
20 recombination is used, the integration event may also be designed to delete 
endogenous PKS genes residing in the chromosome, as described in the PCT 
application WO 95/08548. In these embodiments, too, a selectable marker such as 
hygromycin or thiostrepton resistance can be included in the vector that effects 
integration. 

25 As mentioned above, additional enzymes that effect post-translational 

modifications to the enzyme systems in the megalomicin PKS may be introduced 
into the host through suitable recombinant expression systems. In addition, 
enzymes that activate the polyketides themselves, for example, through 
glycosylation may be added. It may also be desirable to modify the cell to produce 

30 more of a particular substrate utilized in polyketide biosynthesis. For example, it 
is generally believed that malonyl CoA levels in yeast are higher than 
methylmalonyl CoA; if yeast is chosen as a host, it may be desirable to increase 
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methylmalonyl CoA levels by the addition of one or more biosynthetic enzymes 
therefor. 

The multiple- vector expression system can also be used to make 
polyketides produced by the addition of synthetic starter units to a PKS that 
5 contains an inactivated ketosynthase (KS) in the first module. As noted above, 
this modification permits the system to incorporate a suitable diketide thioester 
such as 3-hydroxy-2-methyI pantonoic acid-N-acetyl cysteamine thioester, or 
similar thioesters of diketide analogs, as described by Jacobsen et al., Science, 
277 :367-369 (1997). The construction of PKS modules containing inactivated 

10 ketosynthase regions can be conducted by methods known in the art, such as the 
method described in U.S. Patent No. 6,080,555 and PCT publication Nos. WO 
99/03986 and 97/02358, each of which is incorporated herein by reference, in 
accordance with the methods of the present invention. 

The multiple-vector expression system can be used to produce polyketides 

1 5 in hosts that normally do not produce them, such as E. coli and yeast. It also 
provides more efficient means to provide a variety of polyketide products by 
supplying the elements of the introduced PKS, whether in an E. coli or yeast host 
or in other more traditionally used hosts, such as Streptomyces. The invention 
also includes libraries of polyketides prepared using the methods of the invention. 

20 

Section VIII: Compounds 

The methods and recombinant DNA compounds of the invention are useful 
in the production of polyketides. Jn one important aspect, the invention provides 
methods for making antibiotic compounds related in structure to erythromycin, a 

25 potent antibiotic compound. The invention also provides novel ketolide 

compounds, polyketide compounds with potent antibiotic activity of significant 
interest due to activity against antibiotic resistant strains of bacteria. See 
Griesgraber et al., 1996, J. AntibioL 49: 465-477, incorporated herein by 
reference. Most if not all of the ketolides prepared to date are synthesized using 

30 erythromycin A. a derivative of 6-dEB, as an intermediate. In one embodiment, 
the present invention provides the 3-keto derivatives of the megalomicins for use 
as antibiotics. In particular, the 3-keto derivative of megalomicin A is a preferred 
ketolide of the invention. These compounds can be made chemically, substantially 
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in accordance with the procedures for making ketolides described in the prior art, 
or in recombinant host cells of the invention in which the megosamine and 
desosamine biosynthetic and transferase genes are present but which do not make 
or transfer the mycarose moiety and/or the PKS has been modified to delete the 
5 ICR domain of extender module 6. The invention also provides methods for 

making intermediates useful in preparing traditional, 6-dEB- and erythromycin- 
derived ketolide compounds. See'Griesgraber et ai t supra; Agouridas et aL 9 1998, 
J. Med Chem. 41: 4080-4100, U.S. Patent Nos. 5,770,579; 5,760,233; 5,750,510; 
5,747,467; 5,747,466; 5,656,607; 5,635,485; 5,614,614; 5,556,1 18; 5,543,400; 
10 5,527,780; 5,444,051; 5,439,890; 5,439,889; and PCT publication Nos. WO 
98/09978 and 98/28316, each of which is incorporated -herein by reference. 

As noted above, the hybrid PKS genes of the invention can be expressed in 
a host cell that contains the desosamine, megosamine, and/or mycarose 
biosynthetic genes and corresponding transferase genes as well as the required 
1 5 hydroxylase gene(s), which may, for example and without limitation, be either 
picK y megK, or eryK (for the CM 2 position) and/or megF oxeryF (for the C-6 
position). The resulting compounds have antibiotic activity but can be further 
modified, as described in the patent publications referenced above, to yield a 
desired compound with improved or otherwise desired properties. Alternatively, 
20 the aglycone compounds can be produced in the recombinant host cell, and the 

desired glycosylation and hydroxylation steps carried out in vitro or in vivo, in the 
latter case by supplying the converting cell with the aglycone, as described above. 

The compounds of the invention are thus optionally glycosylated forms of 
the polyketide set forth in formula (1) below which are hydroxylated at either the 
25 C-6 or the C-12 or both. The compounds of formula (1) can be prepared using the 
loading and the six extender modules of a modular PKS, modified or prepared in 
hybrid form as herein described. These polyketides have the formula: 
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including the glycosylated and isolated stereoisomer^ forms thereof; 

wherein R* is a straight chain, branched or cyclic/saturated or unsaturated 
substituted or unsubstituted hydrocarbyl of 1-15C; 
5 each of R ! -R 6 is independently H or alkyl (1-4C) wherein any alkyl at R 1 

may optionally be substituted; 

each of X -X 3 is independently two H, H and OH, or =0; or 

each of X^X 5 is independently H and the compound of formula (2) 
contains a double-bond in the ring adjacent to the position of said X at 2-3, 4-5, 6- 
10 7, 8-9 and/or 10-11; 

with the proviso that: 

at least two of R l -R 6 are alkyl (1-4C). 

Preferred compounds comprising formula 2 are those wherein at least three 
of R -R 3 are alkyl (1-4C), preferably methyl or ethyl; more preferably wherein at 

1 5 least four of R'-R 5 are alkyl (1-4C), preferably methyl or ethyl. Also preferred are 
those wherein X 2 is two H, =0, or H and OH, and/or X 3 is H, and/or X 1 is OH 
and/or X 4 is OH and/or X 5 is OH. Also preferred are compounds with variable R* 
when R*-R 5 is methyl, X 2 is =0, and X 1 , X 4 and X 5 are OH. The glycosylated 
forms (i.e., mycarose or cladinose at C-3, desosamine at C-5, and/or megosamine 

20 at C-6) of the foregoing are also preferred. 

As described above, there are a wide variety of diverse organisms that can 
modify compounds such as those described herein to provide compounds with or 
that can be readily modified to have useful activities. For example, 
Saccharopolyspora erythraea can convert 6-dEB to a variety of useful 
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t 

compounds. The compounds provided by the present invention can be provided to 
cultures of Saccharopolyspora erythraea and converted to the corresponding 
derivatives of erythromycins A, B, C, and D in accordance with the procedure 
provided in the Examples, below. To ensure that only the desired compound is 
5 produced, one can use an & erythraea eryA mutant that is unable to produce 6- 
dEB but can still carry out the desired conversions (Weber et a/., 1985, J. 
Bacteriol. J64(\): 425-433). Also, one can employ other mutant strains, such as 
eryB, eryC, eryG, and/or eryK mutants, or mutant strains having mutations in 
multiple genes, to accumulate a preferred compound. The conversion can also be 
10 carried out in large fermentors for commercial production. Each of the 

erythromycins A, B, C, and D has antibiotic activity, although erythromycin A has 
the highest antibiotic activity. Moreover, each of these compounds can form, 
under treatment with mild acid, a C-6 to C-9 hemiketal with motilide activity. For 
formation of hemiketals with motilide activity, erythromycins B, C, and D, are 
1 5 preferred, as the presence of a C- 12 hydroxy 1 allows the formation of an inactive 
compound that has a hemiketal formed between C-9 and C-12. 

Thus, the present invention provides the compounds produced by 
hydroxylation and glycosylation of the compounds of the invention by action of 
the enzymes endogenous to Saccharopolyspora erythraea and mutant strains of S. 
20 erythraea. Such compounds are useful as antibiotics or as motilides directly or 
after chemical modification. For use as antibiotics, the compounds of the 
invention can be used directly without further chemical modification. 
Erythromycins A, B, C, and D all have antibiotic activity, and the corresponding 
compounds of the invention that result from the compounds being modified by 
25 Saccharopolyspora erythraea also have antibiotic activity. These compounds can 
be chemically modified, however, to provide other compounds of the invention 
with potent antibiotic activity. For example, alkylation of erythromycin at the C-6 
hydroxyl can be used to produce potent antibiotics (clarithromycin is C-6-0- 
methyl), and other useful modifications are described in, for example, Griesgraber 
30 et at. 7 1996, J. AntibioL 49: 465-477, Agouridas et aL, 1998, J. Med. Chem. 41: 
4080-4100, U.S. Patent Nos. 5,770,579; 5,760,233; 5,750,510; 5,747,467; 
5,747,466; 5,656,607; 5,635,485; 5,614,614; 5,556,118; 5,543,400; 5,527,780; 
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5,444,051; 5,439,890; and 5,439,889; and PCT publication Nos. WO 98/09978 
and 98/28316, each of which is incorporated herein by reference. 

For use as motilides, the compounds of the invention can be used directly 
without further chemical modification. Erythromycin and certain erythromycin 
5 analogs are potent agonists of the motilin receptor that can be used clinically as 
prokinetic agents to induce phase III of migrating motor complexes, to increase 
esophageal peristalsis and LES pressure in patients with GERD, to accelerate 
gastric emptying in patients with gastric paresis, and to stimulate gall bladder 
contractions in patients after gallstone removal and in diabetics with autonomic 

10 neuropathy. See Peeters, 1999, Motilide Web Site, http://www.med.kuleuven. 
ac.be/med/gih/motilid.htm, and Omura et aL, 1987, Macrolides with 
gastrointestinal motor stimulating activity, J. Med Chem. 30: 1941-3). The 
corresponding compounds of the invention that result from the compounds of the 
invention being .modified by Saccharopolyspora erythraea also have motilide 

15 activity, particularly after conversion, which can also occur in vivo, to the C-6 to 
C-9 hemiketaJ by treatment with mild acid. Compounds lacking the C-12 hydroxy] 
are especially preferred for use as motilin agonists. These compounds can also be 
further chemically modified, however, to provide other compounds of the 
invention with potent motilide activity. 

20- Moreover, and also as noted above, there are other useful organisms that 

can be employed to hydroxylate and/or glycosylate the compounds of the 
invention. As described above, the organisms can be mutants unable to produce, 
the polyketide normally produced in that organism, the fermentation can be carried 
out on plates or in large fermentors, and the compounds produced can be 

25 chemically altered after fermentation. In addition to Saccharopolyspora erythraea 7 
Streplomyces venezuelae, S. narbonensis, S. antibioticus, Micromonospora 
megalomicea, S.fradiae, and S. thermotolerans can also be used. In addition to 
antibiotic activity, compounds of the invention produced by treatment with M 
megalomicea enzymes can have antiparasitic activity as well. Thus, the present 

30 invention provides the compounds produced by hydroxylation and glycosylation 
by action of the enzymes endogenous to S. erythraea, S. veneznelae, S. 
narbonensis, S. antibioticus, M. megalomicea* S.fradiae, and S. thermotolerans. 
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The present invention also provides methods and genetic constructs for 
producing the glycosylated and/or hydroxy lated compounds of the invention 
directly in the host cell of interest. Thus, the recombinant genes of the invention, 
which include recombinant megAI, megAIl, and megAHI genes with one or more 

i' 

5 deletions and/or insertions, including replacements of a megA gene fragment with 
a gene fragment from a heterologous PKS gene, can be included on expression 
vectors suitable for expression of the encoded gene products in 
Saccharopolyspora erythraea, Micrombnospora megalomicea, 5. venezuelae, S. 
narbonensis, S. antibioticus, S. fradiae, and S. thermoiolerans. 
1 0 The compounds of the invention can be produced by growing and 

fermenting the host cells of the invention under conditions known in the art for the 
production of other polyketides. The compounds of the invention can be isolated 
from the fermentation broths of these cultured cells and purified by standard 
procedures. The compounds can be readily formulated to provide the 
1 5 pharmaceutical compositions of the invention. The pharmaceutical compositions 
of the invention can be used in the form of a pharmaceutical preparation, for 
example, in solid, semisolid, or liquid form. This preparation will contain one or 
more of the compounds of the invention as an active ingredient in admixture with 
an organic or inorganic carrier or excipient suitable for external, enteral, or 
20 parenteral application. The active ingredient may be compounded, for example, 
with the usual non-toxic, pharmaceutically acceptable carriers for tablets, pellets, 
capsules, suppositories, solutions, emulsions, suspensions, and any other form 
suitable for use. 

The carriers which can be used include water, glucose, lactose, gum acacia, 
25 gelatin, mannitol, starch paste, magnesium trisilicate, talc, corn starch, keratin, 
colloidal silica, potato starch, urea, and other carriers suitable for use in 
manufacturing preparations, in solid, semi-solid, or liquified form. In addition, 
auxiliary stabilizing, thickening, and coloring agents and perfumes may be used. 
For example, the compounds of the invention may be utilized with hydro xypropyl 
30 methylcellulose essentially as described in U.S. Patent No. 4,916,138, 

incorporated herein by reference, or with a surfactant essentially as described in 
EPO patent publication No. 428,169, incorporated herein by reference. 
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Oral dosage forms may be prepared essentially as described by Hondo et 
al., 1987, Transplantation Proceedings XIX, Supp. 6: 17-22, incorporated herein 
by reference. Dosage forms for external application may be prepared essentially as 
described in EPO patent publication No. 423,714, incorporated herein by 
5 reference. The active compound is included in the pharmaceutical composition in 
an amount sufficient to produce the desired effect upon the disease process or 
condition. 

For the treatment of conditions and diseases caused by infection, a 
compound of the invention may be administered orally, topically, parenterally, by 

1 0 inhalation spray, or rectally in dosage unit formulations containing conventional 
non-toxic pharmaceutical^ acceptable carriers, adjuvant,-and vehicles. The term 
parenteral, as used herein, includes subcutaneous injections, and intravenous, 
intramuscular, and intrastemal injection or infusion techniques. 

Dosage levels of the compounds of the invention are of the order from 

15 about 0.01 mg to about 50 mg per kilogram of body weight per day, preferably 
from about 0.1 mg to about 10 mg per kilogram of body weight per day. The 
dosage levels are useful in the treatment of the above-indicated conditions (from 
about 0.7 mg to about 3.5 mg per patient per day, assuming a 70 kg patient). In 
addition, the compounds of the invention may be administered on an intermittent 

20 basis, i.e., at semi-weekly, weekly, semi-monthly, or monthly intervals. 

The amount of active ingredient that may be combined with the carrier 
materials to produce a single dosage form will vary depending upon the host 
treated and the particular mode of administration. For example, a formulation 
intended for oral administration to humans may contain from 0.5 mg to 5 gm of 

25 active agent compounded with an appropriate and convenient amount of carrier 
material, which may vary from about 5 percent to about 95 percent of the total 
composition. Dosage unit forms will generally contain from about 0.5 mg to about 
500 mg of active ingredient. For external administration, the compounds of the 
invention may be formulated within the range of, for example, 0.00001% to 60% 

30 by weight, preferably from 0.001% to 10% by weight, and most preferably from 
about 0.005% to 0.8% by weight. 

It will be understood, however, that the specific dose level for any 
particular patient will depend on a variety of factors. These factors include the 
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activity of the specific compound employed; the age, body weight, general health, 
sex, and diet of the subject; the time and route of administration and the rate of 
excretion of the drug; whether a drug combination is employed in the treatment; 
and the severity of the particular disease or condition for which therapy is sought. 
5 . A detailed description of the invention having been provided above, the 

following examples are given for the purpose of illustrating the invention and shall 
not be construed as being a limitation on the scope of the invention or claims. 

Example 1 

10 Cloning and Characterization of the Megalomicin Biosvnthetic Gene Cluster from 

Micromonospora meslomicea 
Experimental Procedures 

Bacterial Strains, Media, and Growth Conditions 

Routine DNA manipulations were performed in Escherichia colt XL1 Blue 

15 or E. coli XL1 Blue MR (Stratagene) using standard culture conditions (Sambrook 
et al. 9 1989). M megalomicea subs, nigra NRRL3275 was obtained from the 
ATCC collection and cultured according to recommended protocols. For isolation 
of genomic DNA, M. megalomicea was grown in TSB (Hopwood et al., 1985) at 
30 °C. S. lividans K4-1 14 (Ziermarm and Betlach, 1999), which carries a deletion 

20 of the actinorhodin biosynthetic gene cluster, was used as the host for expression 
of the megAI-AIII genes. S. lividans strains were maintained on R5 agar at 30°C 
and grown in liquid YEME for preparation of protoplasts (Hopwood et aL, 1985) . 
S. erythraea NRJRL2338 was used for expression of the megosamine genes. S. 
erythraea strains were maintained on R5 agar at 34°C and grown in liquid TSB for 

25 preparation of protoplasts. 

Manipulation of DNA and Organisms 

Manipulation and transformation of DNA in E. coli was performed by 
standard procedures (Sambrook et aL, 1989) or by suppliers protocols. Protoplasts 
30 of S. lividans and S. erythraea were generated for transformation by plasmid DNA 
using the standard procedure. S. lividans transformants were selected on R5 using 
2 ml of a 0.5 mg/ml thiostrepton overlay. S. erythraea transformants were selected 
on R5 using 1.5 ml of a 0.6 mg/ml apramycin overlay. 
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Isolation of the meg gene cluster 

A cosrnid library was prepared in SuperCos (Stratagene) from M 
megatomicea total DNA partially digested with Sau3A 1, and introduced into E. 
5 coli using a Gigapack HI XL (Stratagene) in-vitro packaging kit. 32 P-labelled DNA 
probes encompassing the KS2 domain from ery DEBS, or a mixture of segments 
encompassing modules 1 and 2 from ery DEBS were used separately to screen the 
cosrnid library by colony hybridization. Several colonies which hybridized with 
the probes were further analyzed by sequencing the ends of their cosrnid inserts 

1 0 using T3 and T7 primers. BLAST (Altschul et aL y 1 990) analysis of the sequences 
revealed several colonies with DNA sequences highly homologous to genes from 
the ery cluster. Together with restriction analysis, this led to the isolation of two 
overlapping cosmids, pKOS079-93A and pKOS079-93D which covered -45 kb of 
the meg cluster. A 400 bp PCR fragment was generated from the left end of and 

1 5 pKOS079-93D and used to reprobe the cosrnid library. Likewise, a 200 bp PCR 
fragment generated from the right end of pKOS079-93 A was used to reprobe the 
cosrnid library. Analysis of hybridizing colonies as described above resulted in 
identification of two additional cosmids, pKOS079-138B and pKOS79-124B 
which overlap the previous two cosmids. BLAST analysis of the far left and right 

20 end sequences of these cosmids indicated no homology to any known genes 
related to polyketide biosynthesis and therefore indicates that the set of four 
cosmids spans the entire megalomicin biosynthettc gene cluster. 

DNA sequencing and analysis 

25 PCR-based double stranded DNA sequencing was performed on a 

Beckman CEQ 2000 capillary sequencer using reagents and protocols provided by 
the manufacturer. A shotgun library of the entire cosrnid pKOS079-93D insert was 
made as follows: DNA was first digested with Dra I to eliminate the vector 
fragment, then partially digested with Sai/3A I. After agarose electrophoresis, 

30 bands between 1-3 kb were excised from the gel and ligated with BamW I digested 
pUC19. Another shotgun library was generated from a 12 kb Xho MEcoK I 
fragment subcloned from cosrnid pKOS079-93A to extend the sequence to the 
wegFgene. A 4 kb Bgl HI Xho I fragment from cosrnid pKOS079-138B was 
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sequenced by primer walking to extend the sequencing to the megT gene. 
Sequence was assembled using Sequencher (Gene Codes Corp.) software package 
and analyzed with MacVector (Oxford Molecular Group) and the NCBI BLAST 
server (www.ncbi.nlm.nih.gov/BLAST/). 

5 

Plasmids 

Plasmid pKOS108-6 is a modified version of pKAOI27'kan 5 (Zierrriann 
and Betlach, 1999; Ziermann and Betlach, 2000) in which the erj/AHU genes 
between the Pac I and EcoR I sites have been replaced with the megAl-lll genes. 
1 0 This was done by first substituting a synthetic nucleotide DNA duplex (5'- 
TAAGAATTCGGAGATCTGGCCTCAGCTCTAGAC (SEQ ID NO: 21), 

complementary oligo 5'- 

AATTGTCTAGAGCTGAGGCCAGATCTCCGAATTCTTAAT (SEQ ID NO: 
22)) between the Pac I and EcoR I sites of the pKA0127'kan' vector fragment: 
1 5 The 22 kb EcoR UBgl II fragment from cosmid pKOS079-93D containing the 

megAI-II genes was inserted into EcoR I and Bgl II sites of the resulting plasmid to 
generate pKOS024-84. A 12 kb Bgl IVBbvC I fragment containing the megAIII 
and part of the megCIl gene was subcloned from pKOS079-93A and excised as a 
Bgl WXba I fragment and ligated into the corresponding sites of pKOS024-84 to 
20 yield the final expression plasmid pKOS 108-06. 

The megosamine integrating vector, pKOS97-42, was constructed as 
follows: A subclone was generated containing the 4 kbXlw l/Sca I fragment from 
pKOS79-l38B together with the 1 .7 kb Sea l/Pst I fragment from pKOS79-93D in 
Litmus 28 (Stratagene). The entire 5.7 kb fragment was then excised as a Spe MPs! 
25 I fragment and combined with the 6.3 kb Pst VEcoR I fragment from KOS79-93D 
and EcoR VXba I digested pSET152 (Bierman et aL 9 1992) to construct plasmid 
plCOS97-42. 

Production and analysis of secondary metabolites 
30 Fermentation for production of polyketide, LC/MS analysis, and 

quantification of 6-dEB for S. lividans K4- 1 1 4/pKOS 1 08-6 and S. lividans K4- 
1 1 4^^0127^^ were essentially as previously described (Xue et ai 9 1999). S. 
erythraea NRRL2338 and & erythraeafpK.O§97-42 were grown for 6 days in Fl 
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media (Brunker et al y 1998). Samples of broth were clarified in a microcentrifuge 
(5 min, 1 3,000 rpm). For LC/MS preparation, isopropanol was added to the 
supernatant (1 :2 ratio) and centrifuged again. Erythromycins and megalomicins 
were detected by electrospray mass spectrometry and quantity was determined by 
5 evaporative light scattering detection (ELSD). The LC retention time and mass 
spectra of erythromycin and megalomicins were identical to known standards. 

Nucleotide sequence of the meg gene cluster 

A series of 4 overlapping inserts containing the meg cluster (Figure 9) were 

1 0 isolated from a cosmid library prepared from total genomic DNA of M 

megalomicea and covers > 100 kb of the genome. A contiguous 48 kb segment 
which encodes the megalomicin PKS and several deoxysugar biosynthetic genes 
was sequenced and analyzed. The segment contains 17 complete ORFs as well as 
an incomplete ORF at each end, organized as shown in Figure 9, 

1 5 PKS genes. The ORFs megAI, megA II and megAIII encode the polyketide 

synthase responsible for synthesis of 6-dEB. The enzyme complex, meg DEBS, is 
highly similar to ery DEBS, with each of the three predicted polypeptides sharing 
an average of 83% overall similarity with their ery PKS counterpart. Both PKSs 
are composed of 6 modules (2 modules per polypeptide) and each module is 

20 organized in the identical manner (Figure 9). A dendrogram analysis (Schwecke et 
aly 1995) employing 70 acyltranferase (AT) domains revealed that the 6 meg 
extender AT domains cluster with AT domains that incorporate methylmalonyl 
Co A (not shown). The loading module of meg DEBS also lacks a KS^ domain 
which is utilized by most macrolide PKSs for decarboxylation of the starter unit to 

25 initiate polyketide synthesis (Bisang et ai, 1 999; Kuhstoss et al^ 1 996; Kakavas et 
ai y 1997; Xue et al., 1998), implying that priming begins with a propionate unit. 
In addition, a conserved Gly to Pro substitution in the NADPH-binding region of 
the ketoreductase (KR) domain of module 3 is observed in meg DEBS, which has 
been proposed to account for its inactivity in ery DEBS (Donadio et al. 9 1991). 

30 Deoxysugar genes. BLAST (Altschul et al. y 1990) analysis of the genes 

flanking the PKS indicated that 12 complete ORFs and 1 partial ORF appear to 
encode functions required for synthesis of one of the three megalomicin 
deoxysugars. Assignment of each ORF to a specific deoxysugar pathway was 
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made based on comparison to the ery genes and other related genes involved in 
deoxysugar biosynthesis (Table 2). 



Table 2. Deduced functions of genes identified in the megalomicin gene cluster. 
Gene Closest Match %Sim z Proposed Proposed Function 



(polypeptide)* 



Pathway 



Reference 



megT 

megDVI 

megDl 

megY 

megDl I 

megDIU 



megDl I 



megDV 

megDVli 
megBV 



meg A I 

megAll 

megAIH 

megCIl 

meg CHI 

megBII 

megtf 

megF 



EryBVI 

EryCIl 
EryCIIl 

AcyA (S. 
thermotolerans) 

EryCl 

DesVl (S. 
venezuelae) 
DmnU (5. 
peucethts) 
Dehydrogenase 
(A. oriental is) 
EryBU 
EryBV 



megBIV EryBIV 



Ery A I 

EryAU 

EryAUI 

EryCH 

EryCUl 

EryBU 

EryH 

EryF 



63 
79 
52 



Mycarose/ 
Megosamine 
Megosamine 
Megosamine 



58 Me&osamine 



2.3- Dehydratase 

3.4- lsomerase 

G lycosykransferase 
Mycarose 0-acyI- 
transferase 
Aminotransferase 



61 Megosamine Dimethyltransferase 



65 Megosamine 



73 Megosamine 

86 Mycarose 

80 Mycarose 

81 6-dEB 
85 6-dEB 
83 6-dEB 

82 Desosamine 
89 Desosamine 

87 Mycarose 
84 



3,5-Epimerase 



61 Megosamine 4-Ketoreductase 



2.3- Reductase 

G lycosykransferase 

4-Ketoreductase 

Polyketide Synthase 
Polyketide Synthase 
Polyketide Synthase 

3.4- Isomerase 
Glycosyly transferase 
2,3rReductase 
Thioesterase 

C-6 Hydroxylase 



(Summers et aL, 1997; 
Gaisser et aL, 1997) 
(Summers et aL, 1997) 
(Summers et aL, 1997) 
(Arisawa et aL, 1994) 

{DWxWonetaL, 1989; 
Summers ^ o/., 1997) 
(Xue etaL, 1998) 

(Olanoe/a/., 1999) 

(Summers et aL, 1997; van 
Wageningen et aL, 1998) 
(Summers et aL, 1997) 
(Summers el a/., 1997; 
Gaisser et aL, 1997) 
(Summerset/A, 1997; 
Gaisser et aL, 1997) 
(Donadio and Katz, 1992) 
(Donadio and Katz, 1992) 
(Donadio and Katz, 1992) 
(Summers et aL, 1997) 
(Summers etaL y 1997) 
(Summers etaL, 1997) 
(Hay dock etaL, 1991) 
(Weber et aL, 1991) 



a. Determined by BLASTX analysis using default parameters. 
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Three ORFs, megBV, megCIII and megDI, encode glycosyltransferases, 
apparently one for attachment of each deoxysugar to the macrolide. MegBV was 
most similar to EryBV, the erythromycin mycarosyltransferase, and hence was 
assigned to the mycarose pathway in the meg cluster. The closest match for both of 
5 the remaining glycosyltransferases was EryCIII, the desosaminyltransferase in 
erythromycin biosynthesis. Given the higher degree of similarity between EryCIII 
and MegCIII (Table 2), MegCIII was designated the desosaminyltransferase, 
leaving MegDI as the proposed megosaminyltransferase. In similar fashion, 
assignments were made accordingly for; MegCII and MegDVI, two putative 3,4- 

10 isomerases similar to EryClI; MegBII and MegDVII, 2,3-reductases homologous 
to EryBII; MegBIV and MegDV, putative 4-ketoreductases similar to EryBIV 
(Table 2). The remaining ORFs involved in deoxysugar biosynthesis, megT y 
megDJI, megDIH and megDIV, each encode a putative 2,3-dehydratase, 
aminotransferase, dimethyl transferase and 3,5-epimerase, respectively (Table 2). 

15 Since both the megosamine and desosamine pathways require an aminotransferase 
and a dimethyltransferase, and since mycarose and megosamine each require a 
2,3-dehydratase and a 3,5-epimerase, assignments of these four genes to a specific 
pathway could not be made on the basis of sequence comparison alone. However, 
the latter three are implicated in megosamine biosynthesis by experiments 

20 described below. 

Other genes. Two additional complete ORFs, designated megY and megH 
and an incomplete ORF, designated megF, were also identified in the cluster. 
MegH and MegF share high degrees of similarity with EryH and EryF. EryH and 
homologs in other macrolide gene clusters are thioesterase-like proteins with 

25 unknown function in polyketide gene clusters (Haydock et aL, 1991 ; Xue el aL, 

1998; Butler et oL, 1999; Tang et aL, 1999). EryF encodes the erythronolide B C-6 
hydroxylase (Figure 8) (Weber et aL, 1991; Andersen and Hutchinson, 1992). 
MegY does not have an ery counterpart but appears to belong to a (small) family 
of O-acyltransferases that transfer short acyl chains to macrolides. Two classes 

30 exist: AcyA and MdmB transfer acetyl or propionyl groups to the C-3 hydroxyls 
on 16-membered macrolide rings (Arisawa et aL, 1994; Hara and Hutchinson, 
1 992); CarE and Mpt transfer isovalerate or propionate to the mycarosyl moiety of 
carbomycin and midecamycin, respectively (Epp et aL, 1989; Arisawa el aL y 1993; 
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Gu et aL 7 1996). The structures of various megalomicins suggest that MegY 
belongs to the latter class and is the acyl transferase which converts megalomicin A 
to megalomicins B, CI, or C2 (verified experimentally below)., 

5 Heterologous expression of (he meg PKS genes. 

The wild type and genetically modified versions of the ery DEBS have 
been used extensively in heterologous Strepiomyces hosts for enzyme studies and 
the production of novel polyketide compounds. Given the similarities between the 
ery and meg DEBSs, production characteristics were compared in a commonly 

1 0 used Streptomyces host strain. The three meg A ORFs were cloned into the 

expression plasmid pKA0127'kan' (Ziermann and Betlach, 1999) in place of the 
ery A ORJFs. Both plasmids, pKAOJ27'kan > encoding ery DEBS and pKOS 108-06 
encoding meg DEBS, were introduced in Streptomyces lividans K4-1 14 and the 
production of 6-dEB was determined in shake-flask fermentations. The production 

1 5 profiles were similar in both cases and the maximum titer of 6-dEB was between 
30-40 mg/L. In addition, both PKSs produced small amounts (-5%) of 8,8a- 
deoxyoleandolide, which results from the priming of the PKS with acetate instead 
of propionate (Kao et al, 1994b). This observation indicates that the loading AT 
domains of the PKSs display similar relaxed specificities towards starter units. 

20 

Conversion of erythromycin to megalomicin in S. erythraea. 

An examination of the meg cluster revealed that the putative megosamine 
biosynthetic genes are clustered directly upstream of the PKS genes. If the 
hypothesis that these genes are sufficient for biosynthesis and attachment of 

25 megosamine to an erythromycin intermediate is correct, then functional expression 
of these genes in a strain which produces erythromycin, such as S. erythraea, 
should result in production of megalomicin. A 12 kb DNA fragment carrying all 
the genes between the leftmost Xhol site and the EcdRl site (Figure 9) was 
integrated in the chromosome of S. erythraea using the site-specific integrating 

30 vector pSETl 52 (Bierman et al , 1 992). It was surmised that the left and right ends 
of this fragment would contain necessary promoter regions for transcription of the 
convergent set of genes in M megalomicea and that they would likely operate in 
S. erythraea. 
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Fermentation broth from 5. erythraea/KOS97-42, which contains the 
integrated meg genes, was analyzed by LC/MS and compared to LC/MS profiles 
of the parent S. erythraea strain without the meg genes, as well as to megalomicin 
standards purified from M. megalomicea. The new strain was found to produce a 

5 mixture of erythromycin A and various megalomicins (-4:1 ratio), thereby 

showing that the predicted megosamine biosynthetic and glycosyl transferase genes 
are contained within the cloned meg fragment. The two most abundant congeners 
identified were megalomicins B and CI. Megalomicin A and C2 were also 
detected in smaller amounts. The presence of the megalomicins B, CI and C2 also 

10 provides direct evidence for the function of the <9-acyl transferase, MegY, which 
is present in the integrated meg fragment. 

Discussion 

The homologies observed among modular PKSs enabled the use of ery 
15 PKS genes to clone the meg biosynthetic gene cluster from M megalomicea. The 
close similarities between the megalomicin and erythromycin biosynthetic 
pathways is also reflected in the overall organization of their genes and in the high 
degree of homology of the corresponding individual gene-encoded polypeptides. 
Production of 6-dEB from meg DEBS in S. lividans and conversion of 
20 erythromycin to megalomicin using the megD genes in S. erythraea provides 
direct evidence that the identified gene cluster is responsible for synthesis of 
megalomicin. 

As seen in Figure 9, the - 40 kb segments of the two clusters beginning 
with ery/megBV on the left through the ery/megF genes retain a nearly identical 

25 organizational arrangement. The notable differences in this region are eryG and 
ISJJ36 which are absent from the segment of the meg cluster analyzed. The eryG 
gene encodes an S-adenosylmethionine (SAM)-dependent mycarosyl 
methyltransferase that converts erythromycin C to erythromycin A (Figure 8) 
(Weber et aL, 1990; Haydock et ai, 1991). The mycarose moiety is modified by 

30 esterification (MegY) in megalomicin biosynthesis (Figure 8) and, therefore, the 
absence of an eryG homolog would be expected in the meg cluster. The \SJJ36 
element located between eryAI and eryAII (Pon&dlo and Staver, 1993) is not 
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known to play a role in erythromycin biosynthesis and its origin in the ery cluster 
has not been determined. 

Upstream of the common meg/eryBIV and BY genes, the gene clusters 
diverge. The ~ 6 kb segment between eryBVznd eryK, the left border of the ery 
5 gene cluster (Pereda et al., 1997), contains the remaining genes required for 

mycarose (eryBVI and BVII) and desosamine biosynthesis (eryCIV, CK, and CVI) 
and the C-12 hydroxylase (eryK) (Stassi et al. 3 1993). In contrast, the region 
upstream of megBV encodes a set of genes (megDI-DVII and megY) which can 
account for all the activities unique to megalomicin biosynthesis (Figure 9). Since 
10 introduction of this meg DNA segment into S. erythraea results in production of 
megalomicins, it is clear that these genes encode the functions for TDP- 
' megosamine biosynthesis and transfer to its putative substrate erythromycin C, and 
to acylate megalomicin A (Figure 8). The remaining region upstream of megDVI 
should therefore encode genes only for mycarose and desosamine biosynthesis. 
] 5 Olano et aL (Olano et al. 7 1999) have recently described a pathway for 

biosynthesis of TDP-L-daunosamine, a deoxysugar component of the antitumor 
compounds daunorubicin and doxorubicin produced by Streptomyces peucetius. 
Their pathway proposes four steps from the intermediate TDP-4-keto-6- 
deoxyglucose controlled by the gene cluster dnmJQTUVZ, although the functions 
20 for dnmO and dnmZ could not be identified and the precise order of reactions in 
the pathway could not be determined. The genes dnmT, dnmU, dnmJ and dnmV 
each have proposed counterparts in the meg cluster, tnegT, megDIV, megDII, and 
megDV, respectively (see Figure 10) 

It is possible to describe a pathway to convert TDP-2,6-dideoxy-3,4- 
25 diketo-D-hexose (or its enol tautomer), the last intermediate common to the 

mycarose and megosamine pathways, to TDP-megosamine through the sequence 
of 5-epimerization, 4-ketoreduction, 3-amination, and 3-7V-dimethylation 
employing the genes megDIV, megDV, megDII, and megDIII. This employs the 
same functions proposed for biosynthesis of TDP-daunosamine by Olano et aL, 
30 but in a different sequential order. However, it does not account for the megDVI 
and megDVIl genes since their activities are not required for this route. A parallel 
pathway which employs these genes is also shown in Figure 10. In this alternate 
route, 2,3-reduction and 3,4-tautomerization are performed by the megDVIl and 
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megDVI gene products, respectively. A unified single pathway that employs both 
4-ketoreduction {megDV) and 2,3-reduction {megDVII) could not be determined. 
Because the entire gene set from megZW through megDVII was introduced in S. 
erythraea to produce TDP-megosamine, it is not possible to determine which, if 
5 either, of the two alternative pathways is operative, but this can be addressed 
through systematic gene disruption and complementation. 

The 48 kb segment sequenced also contains genes required for synthesis of 
TDP-L-mycarose and TDP-D-desosamine (Fig 10). For the latter, megCII, which 
encodes a putative 3,4-isomerase, the first step in the committed TDP-desosamine 

1 0 pathway, appears to be translationally coupled to megAIII, almost exactly as its 
erythromycin counterpart, eryCII, was found translationally coupled to eryAIII 
(Summers et ai, 1997). The high degree of similarity between MegCII and EryCII 
suggests that the pathway to desosamine in the megalomicin- and erythromycin- 
producing organisms are most likely the same. Similarly, the finding that megBII 

I 5 and megBIV, encoding a 2,3-reductase and 4-ketoreductase, contain close 

homologs in the mycarose pathway for erythromycin also suggests that TDP-L- 
mycarose synthesis in the two host organisms is the same. 

Of interest are the two genes that encode putative 2,3-reductases, megBII 
and megDVII. Because MegBII most closely resembles EryBII, a known mycarose 

20 biosynthetic enzyme (Weber et al z 1990), and because megBII resides in the same 
location of the meg cluster as its counterpart in the ery cluster, megBII is assigned 
to the mycarose pathway and megDVII to the megosamine pathway. Furthermore, 
the lower degree of similarity between MegDVII and either EryBII or MegBII 
(Table 2) provides a basis for assigning the opposite L and D isomeric substrates 

25 to each of the enzymes (Figure 10). Finally, megT y which encodes a putative 2,3- 
dchydratase, is also related to a gene in the ery mycarose pathway, eryBVL In S. 
erythraea, the proposed intermediate generated by EryBVl represents the first 
committed step in the biosynthesis of mycarose (Figure 10). However, the 
proposed pathways in Figure 10 suggest this may be an intermediate common to 

30 both mycarose and megosamine biosynthesis in M megalomicea. Therefore, megT 
is named following the designation of the equivalent gene in the daunosamine 
pathway, dnmT (Olano et aL, 1999) 
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The preferred host- vector system for expression of meg DEBS described 
here has been used previously for the heterologous expression of modular PKS 
genes from the erythromycin (Kao et al., 1994a; Ziermann and Betlach, 1999), 
picromycin (Tang et al, 1999) and oleandomycin pathways, as well as for the 
5 generation of novel polyketide backbones where domains have been removed, 
added or exchanged in various combinations (McDaniel et ai, 1999). Recently, 
hybrid polyketides have been generated through the co-expression of subunits 
from different PKS systems (Tang et al, 2000). 

Expression of the megDVI-megDV/I segment in S. erythraea and the 

1 0 corresponding production of megalomicins in this host establishes the likely order 
of sugar attachment in megalomicin synthesis. Furthermore, it provides a means to 
produce megalomicin in a more genetically friendly host organism, leading to the 
creation of megalomicin analogs by manipulating the PKS. Over 60 6-dEB 
analogs have been produced by combinatorial biosynthesis using the ery PKS 

1 5 (McDaniel et al , 1 999; Xue et al , 1 999). The titers of megalomicin could also be 
significantly increased above the 5 mg/L obtained from M megalomicieo by 
introducing the genes into an industrially optimized strain of S. erythraea, many of 
which can produce as much as 10 g/L of erythromycin. 
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Example 2 

Stabilizing meg PKS Expression Plasmid by Codon Engineering 

30 Materials and methods 

All bacterial strains were cultured and transformed as described in 
Example 1 . 
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Fermentation of Streplomyces and dike tide feeding 

Primary Sireptomyces transformants were picked and placed in 6 mL of 
TSB liquid medium with 50 jig/L of thiostrepton and grown at 30°C. When the 
5 culture showed some growth (3-4days), it was transferred into a 250 mL flask 

containing 50 mL of R6 medium (pH 7.0) with 25 ug/L of thiostrepton and lg/L of 
diketide ((2s,3R)2-methyJ-3-hydroxyhexanoate N-propionyl cysteamine thioester) 
and placed in a 30°C incubator for 7 days. 



1 0 Changing codons and making plasmids 

There are several identical sequences in the coding sequences for module 2 
and module 6 of the megalomicin PKS gene cluster. Expression plasmids 
containing the full length megalomicin PKS appeared to be somewhat unstable 
and subject to deletion in recA + strains like ET1 24567 and Streplomyces by intra- 

1 5 plasmid homologous recombination. To prevent significant homologous 

recombination and so stabilize expression plasmids, the codons of two regions of 
the module 6 coding sequence that are identical to regions in the module 2 coding 
sequence were changed without changing the sequence of protein encoded. The 
two regions changed in module 6 were from the 26739 base to 27,267 base and 

20 from position 27,697 th base to 27,987 th base, which were identical to the region 
from position 681 0 lh base to 7338 th base and regions from position 7778 th base to 
8068 th base, respectively. The start codon of the loading domain of the meg PKS 
was set to be the 1 st base. These sequences are shown below 



25 > 6810-7338 Sequence in Module 2 

TTGCAGCGGTTGTCGGTGGCGGTGCGGGAGGGGCGTCGGGTGTTGGGTGTGGTGGTGGGT 
TCGGCGGTGAATCAGGATGGGGCGAGTAATGGGTTGGCGGCGCCGTCGGGGGTGGCGCAG 
CAGCGGGTGATTCGGCGGGCGTGGGGTCGTGCGGGTGTGTCGGGTGGGGATGTGGGTGTG 
GTGGAGGCGCATGGGACGGGGACGCGGTTGGGGGATCCGGTGGAGTTGGGGGCGTTGTTG 

30 GGGACGTATGGGGTGGGTCGGGGTGGGGTGGGTCCGGTGGTGGTGGGTTCGGTGAAGGCG 
AATGTGGGTCATGTGCAGGCGGCGGCGGGTGTGGTGGGTGTGATCAAGGTGGTGTTGGGG 
TTGGGTCGGGGGTTGGTGGGTCCGATGGTGTGTCGGGGTGGGTTGTCGGGGTTGGTGGAT 
TGGTCGTCGGGTGGGTTGGTGGTGGCGGATGGGGTGCGGGGGTGGCCGGTGGGTGTGGAT 
GGGGTGCGTCGGGGTGGGGTGTCGGCGTTTGGGGTGTCGGGGACGAAT {SEQ ID NO: 23) 

35 > 26736-27267 Sequence in Module 6 

CTGCAGCGGTTGTCGGTGGCGGTGCGGGAGGGGCGTCGGGTGTTGGGTGTGGTGGTGGGT 
TCGGCGGTGAATCAGGATGGGGCGAGTAATGGGTTGGCGGCGCCGTCGGGGGTGGCGCAG 
CAGCGGGTGATTCGGCGGGCGTGGGGTCGTGCGGGTGTGTCGGGTGGGGATGTGGGTGTG 
GTGGAGGCGCATGGGACGGGGACGCGGTTGGGGGATCCGGTGGAGTTGGGGGCGTTGTTG 

40 GGGACGTATGGGGTGGGTCGGGGTGGGGTGGGTCCGGTGGTGGTGGGTTCGGTGAAGGCG 
AATGTGGGTCATGTGCAGGCGGCGGCGGGTGTGGTGGGTGTGATCAAGGTGGTGTTGGGG 
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TTGGGTCGGGGGTTGGTGGGTCCGATGGTGTGTCGGGGTGGGTTGTCGGGGTTGGTGGAT 
TGGTCGTCGGGTGGGTTGGTGGTGGCGGATGGGGTGCGGGGGTGGCCGGTGGGTGTGGAT 
GGGGTGCGTCGGGGTGGGGTGTCGGCGTTTGGGGTGTCGGGGACGAAT (SEQ ID NO: 24) 
> 26736-27267 Sequence with- Codon Changes 
5 CTGCAGCGCCTCTCCGTCGCCGTCCGCGAGGGCCGCCGAGTCCTCGGCGTCGTCGTCGGC 
TCGGCCGTCAACCAAGACGGCGCGTCAAACGGCCTCGCCGCGCCCTCCGGCGTCGCCCAG 
CAGCGCGTCATACGCCGCGCGTGGGGACGCGCCGGAGTATCGGGCGGCGACGTCGGAGTC 
GTCGAGGCCCACGGCACCGGCACCCGCCTCGGGGATCCCGTCGAGCTGGGCGCCCTCCTG 
GGCACGTACGGCGTCGGCCGCGGCGGCGTCGGCCCGGTCGTCGTCGGCAGCGTCAAGGCC 
] 0 AACGTCGGCCACGTCCAGGCCGCGGCCGGCGTCGTCGGGGTCATCAAGGTCGTCCTCGGC 
CTCGGCCGCGGGCTGGTCGGCCCGATGGTCTGCCGCGGCGGCCTCAGCGGCCTCGTCGAC 
TGGTCGTCCGGCGGCCTGGTCGTCGCGGACGGGGTCCGCGGCTGGCCGGTCGGCGTCGAC 
GGCGTCCGCCGGGGCGGCGTCTCGGCGTTCGGCGTCAGCGGGACGAAT (SEQ ID NO: 25) 



15 > 6978-7337 Sequence in Module 2 

GGTGGAGTGTGATGCGGTGGTGTCGTCGGTGGTGGGGTTTTCGGTGTTGGGGGTGTTGGA 
GGGTCGGTCGGGTGCGCCGTCGTTGGATCGGGTGGATGTGGTGCAGCCGGTGTTGTTCGT 
GGTGATGGTGTCGTTGGCGCGGTTGTGGCGGTGGTGTGGGGTTGTGCCTGCGGCGGTGGT 
GGGTCATTCGCAGGGGGAGAT.CGCGGCGGCGGTGGTGGCGGGGGTGTTGTCGGTGGGTGA 

20 TGGTGCGCGGGTGGTGGCGTTGCGGGCGCGGGCGTTGCGGGCGTTGGCCGG (SEQ ID NO: 
26) 

> 27697-27987 Sequence in Module 6 

GGTGGAGTGTGATGCGGTGGTGTCGTCGGTGGTGGGGTTTTCGGTGTTGGGGGTGTTGGA 
GGGTCGGTCGGGTGCGCCGTCGTTGGATCGGGTGGATGTGGTGCAGCCGGTGTTGTTCGT 
25 GGTGATGGTGTCGTTGGCGCGGTTGTGGCGGTGGTGTGGGGTTGTGCCTGCGGCGGTGGT 
GGGTCATTCGCAGGGGGAGATCGCGGCGGCGGTGGTGGCGGGGGTGTTGTCGGTGGGTGA 
TGGTGCGCGGGTGGTGGCGTTGCGGGCGCGGGCGTTGCGGGCGTTGGCCGG (SEQ ID NO: 
27) 

> 27697-27987 Sequence with Codon Changes 

30 CGTGGAGTGCGATGCGGTCGTGTCGAGCGTCGTCGGCTTCAGCGTGCTGGGCGTCCTGGA 
GGGCCGCAGCGGCGCCCCGAGCCTGGACCGCGTCGACGTGGTCCAGCCGGTCCTGTTCGT 
GGTCATGGTCAGCCTGGCCCGCCTGTGGCGCTGGTGCGGCGTGGTCCCGGCCGCCGTGGT 
CGGCCACAGCCAGGGCGAGATCGCCGCCGCGGTCGTGGCCGGCGTCCTGAGCGTCGGCGA 
CGGCGCCCGCGTCGTGGCCCTGCGCGCCCGCGCCCTGCGCGCCCTGGCCGG (SEQ ID MO: 

35 28) 



Three pieces of DNA from the two regions above were synthesized and verified by 
Retrogen, and the synthesized DNAs were cloned into pCR-Blunt II -TOPO, as 
shown in the Table 3 below. 

40 



Table 3. Plasmids containing synthesized DNA 



Plasmids 


Cloning sites and positions in meg PICS 


pKOS97-1613 


Pstl-BamHI, 26,739 m -26,947 lh base 


PKOS97-1622 


BamHI-BsmI, 26,947 lh -27,267 th base 


PKOS97-1628 


SfaNI-Fsel, 27,697 th - 27,987'" base 



Assembly of the expression plasmid 

First, ligation of the Pstl-BamHI fragment of pKOS97-1613, the BamHl- 
45 BsmI fragment of pKOS97-1622 and Bsml-Pstl linearized pKOS97-90 produced 
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pKOS97- 151. Then, the insertion of the SfaNl-Fsel fragment of pKOS97-l628 

into pKOS97-151 gave rise to pKS097-152. Then, the PstJ-Blpl fragment of 
pKOS97-125 was used to replace the Pstl-BIpI fragment of pKOS97-90a and 
produced pKOS97- 160. 
5 The final expression plasmid (in pRM5) pKOS97-162 was the result of 

BgUI-Nhel fragment of pKOS97-160 inserted into Bglll-Nhel sites of pKOS108- 
04. 

Another expression plasmid pKOS97-152a was made by a four-fragment 
ligation. The four fragments were a Blpl-Xbal fragment (containing a cos site) of 
10 pKOS97-92a, a Bglll-PstI fragment of pKOS97-81, a Pstl-Blpl fragment of 
pKOS97-152, and a BgHI-Xbal fragment of pKOS 108-04 (as the vector). 

Tests of the constructed plasmids showed that the plasmids containing the 
modified coding sequences were more stable than plasmids containing unmodified 
coding sequence. 

15 

Example 3 
Construction of Ole-Meg Hybrid PKS 
Construction of pRMl -based pKOS098-48 for the expression of OlePKS modules 
1-4. 

20 The 240-bp fragment containing the 3'-end portion of oleAII gene (at nt 

1 1210-1 1452; the first base of the start codon of oleAII is nt 1) was PCR amplified 
with primers N98-38-1 (5 1 G A AC AACTCCTGTCTGCGGCCGCG-3 ') (SEQ ID 
NO:29)andN98-38-3 (5'- 

CG GAATTC TCTAGACTCACGTCTCCAACCGCTTGTCGAGG-3 3 ) (SEQ ID 
25 NO: 30). The fragment contains a naturally occurring NotI site at its 5 ? -end and 
the engineered Xbal (bold) and EcoRJ sites (underline) at its 3 '-end following the 
oleAII stop codon. pKOS38-l 89 was digested with EcoRJ and NotI to give five 
fragments of 8 kb, 5 kb, 4 kb, 2.5 kb and 2 kb. The 8-kb EcoRJ-NotI fragment 
containing oleAII gene nt 2961 to nt 1 121 0 and the 240-bp NotI, EcoRJ treated 
30 PCR fragment were Hgated into litmus 28 at the EcoRI site via a three-fragment 
ligation to give pKOS98-46. The 8.2-kb EoRl fragment from pKOS98-46 was 
cloned into pKOS38-174, a pRMl derived plasmid containing oleAI and nt 1 to nt 
2960 of oleAII to give pKOS98-48. 
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Construction ofpSET152-basedpKOS98-60 for the expression of megPKS 
modules 5-6. 

The 360-bp fragment containing nt 1 to nt 366 of megAHI was PCR 

5 amplified with primers N98-40-3 (5 ? - 

TCTAGAC TTAATTAA GGAGGACAC^7^rGAGCGA-GAGCAGC- 
GGCATGACCG-3 *) (SEQ ID NO: 31) and N98-40-2 (5'- AACGCCTCCCAG- 
G AG ATCTCC AGC A-3 ') (SEQ ID NO: 32). A Pad site and a Ndel site as well 
as the ribosome binding site were introduced at the 5 '-end of the megAI start 
10 codon. The 360-bp PacI-BglH fragment was inserted into pKOS 108-06 replacing 
the 22-kb Pacl-Bglll fragment to yield pKOS98-55. The 10-kb Pacl-Xbal 
fragment containing megAIII gene and the annealed oligos N98-23-1 (5'- 
AATTCATAGCCTAGGT-3') (SEQ ID NO: 33) and N98-23-2 (5'- 
CTAG ACCTAGGCT ATG-3 5 ) (SEQ ID NO: 34) were ligated to Pad and EcoRI 

1 5 treated pSETl 52 derivative pKOS98-l 4 via a three- fragment ligation to give 
pKOS98~60. 



Examp le 4 

Conversion of Erythronolides to Erythrom veins 
20 A sample of a polyketide (-50 to 100 mg) is dissolved in 0.6 mL of 

ethanol and diluted to 3 mL with sterile water. This solution is used to overlay a 
three day old culture of Saccharopolyspova erythraea WHM34 (an eryA mutant) 
grown on a 100 mm R2YE agar plate at 30°C. After drying, the plate is incubated 
at 30°C for four days. The agar is chopped and then extracted three times with 100 
25 mL portions of 1% triethylamine in ethyl acetate. The extracts are combined and 
evaporated. The crude product is purified by preparative HPLC (C-18 reversed 
phase, water-acetoni trite gradient containing 1% acetic acid). Fractions are 
analyzed by mass spectrometry, and those containing pure compound are pooled, 
neutralized with triethylamine, and evaporated to a syrup. The syrup is dissolved 
30 in water and extracted three times with equal volumes of ethyl acetate. The 
organic extracts are combined, washed once with saturated aqueous NaHCOj, 
dried overNa2S04 5 filtered, and evaporated to yield -0.15 mg of product. The 
product is a glycosylated and hydroxylated compound corresponding to 
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erythromycin A, B 5 C, and D but differing therefrom as the compound provided 
differed from 6-dEB. 

Example 5 

5 Measurement of Antibacterial Activity 

Antibacterial activity is determined using either disk diffusion assays with 
Bacillus cereus as the test organism or by measurement of minimum inhibitory 
concentrations (MIC) in liquid culture against sensitive and resistant strains of 
Staphylococcus pneumoniae, 

10 

Example 6 
Evaluation of Antiparasitic Activity 
Compounds can initially screened in vitro using cultures of P. falciparum 
FCR-3 and Kl strains, then in vivo using mice infected with P. berghei. Mammalian 
1 5 cell toxicity can be determined in FM3 A or KB cells. Compounds can also be 

screened for activity against P. berhei. Compounds are also tested in animal studies 
and clinical trials to test the antiparasitic activity broadly (antimalarial, 
trypanosomiasis and Leishmaniasis) . 

20 The invention having now been described by way of written description 

and example, those of skill in the art will recognize that the invention can be 
practiced in a variety of embodiments and that the foregoing description and 
examples are for purposes of illustration and not limitation of the following 
claims. 

25 
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Claims 

1 . An isolated nucleic acid comprising a nucleotide sequence 
encoding a domain of megalomicin polyketide synthase (PKS) or a megalomicin 
modification enzyme. 

5 

2. The isolated nucleic acid of claim 1 , which encodes a PKS open 
reading frame (ORF) selected from the group consisting of megAI, megAIT and 
megAHL 

10 3. The isolated nucleic acid of claim 1, wherein the PKS domain is 

selected from the group consisting of a TE domain, a KS domain, an AT domain, 
an ACP domain, a KR domain, a DH domain, and an ER domain. 

4. The isolated nucleic acid of claim 1, wherein the nucleic acid 

15 comprises the coding sequence for a loading module, a thioesterase domain, and 
all six extender modules of megalomicin PKS. 

5. The isolated nucleic acid of claim 1 , which encodes a megalomicin 
modification enzyme that is involved in the conversion of 6-dEB into a 

20 megalomicin. 

6. The isolated nucleic acid of claim 5, which encodes a megalomicin 
modification enzyme that is involved in the biosynthesis of mycarose, 
megosamine or desosamine. 

25 

7. The isolated nucleic acid of claim 1 , wherein the nucleic acid 
codons of homologous regions within the PKS or the megalomicin modification 
enzyme coding sequence have been changed to reduce or abolish the homology 
without changing the amino acid sequences encoded by said changed nucleic acid 

30 codons. 
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8. The isolated nucleic acid of claim 1, which isolated nucleic acid 
fragment hybridizes to a nucleic acid having a nucleotide sequence set forth in the 
SEQ. ID NO: 1. 

5 9. A polypeptide, which is encoded by the isolated nucleic acid 

fragment of claim 1. 

1 0. A recombinant DNA expression vector, comprising the isolated 
nucleic acid of claim 1 operably linked to a promoter. 

10 

11. A recombinant host cell, comprising the recombinant DNA 
expression vector of claim 10. 

1 2. The recombinant host cell of claim 1 1 , which is a Streptomyces or 
15 Saccharopolyspora host cell. 

13. A recombinant host cell of claim 1 1 , which comprises: 

a) at least two separate autonomously replicating recombinant DNA 
expression vectors, each of said vectors comprises a recombinant DNA compound 

20 encoding a megalomicin PICS domain or a megalomicin modification enzyme 
operably linked to a promoter; or 

b) at least one autonomously replicating recombinant DNA expression 
vector and at least one modified chromosome, each of said vector(s) and each of 
said modified chromosome comprises a recombinant DNA compound encoding a 

25 megalomicin PKS domain or a megalomicin modification enzyme operably linked 
to a promoter. 

14. A hybrid PKS that comprises a polypeptide of claim 9 and is 
composed of at least a portion of a megalomicin PKS and at least a portion of a 
30 second PKS for a polyketide other than megalomicin. 
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1 5. The. hybrid PKS of claim 14, wherein the second PKS is selected 
from the group consisting of a narbonolide PKS, an oleandolide PKS, and a DEBS 
PKS. 

5 16. The hybrid PKS of claim 15 that is composed of the megAl and 

megAH gene products and the oleAIII gene product. 

1 7. The hybrid PKS of claim 1 6, wherein the KS domain of module 1 
of the megAI gene product has been inactivated by mutation. 

10 

18. A method of producing a polyketide, which method comprises 
growing the recombinant host cell of claim 1 1 under conditions whereby the 
megalomicin PKS domain encoded by the recombinant expression vector is 
produced and the polyketide is synthesized by the cell, and recovering the 

1 5 synthesized polyketide. 

19. A recombinant host cell that comprises a recombinant expression 
vector that encodes a megalomicin modification enzyme. 

20 20, The recombinant host cell of claim 19 that produces megosamine 

and can attach megosamine to a polyketide, wherein said host cell, in its naturally 
occurring non-recombinant state cannot produce megosamine. 
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cj -P -P -P cd cd tncd-P tned O tn cd tn U trtcd top p cd-P 
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tn U tn u -P -P tn tn tn P rd rd tn U rd tn p> U U U tn U tn 
05 fd tn o tn tn tn fd tP -P tn nj tn tn -P tn tn O rd tn U tn 

rdtntPOtnUUUUOUUUtntnUtntnutnooO 
O O tn rd U -P u+J-PP tn4-)-P tn -P o O O P -P u -P -P 
fd -P cd tn tn tn u fd tn tn rd O -P U U tn tn u 4-> tn u -P O 
tn o U-P tn tn ra tn u tn U tn tn P tn o tn tn tn tn rd tr> tn 

OOOtnOrdrdrdrd-Ptn-p-ptnp-pfd-Pt^OUrd-P 

O rd rd U-PtntnOrdtntno tn U U U O tn U rdp tn U 

U tn tn rd -P tn o U tn tn U tn tn u fd tn O tn u tn tn U tn 

O O tn tn tn tn U U U P fd rd U -P rd U 4-> U U tn tn -P fd 

-P O tn tn tn O rd tn rd tn tn u O rd tn tn -p -p tnu tn -p tn 

tntnoOtntnUUtntnUrdUtnPUUOOUtnOU 
O tn O tn-P tn -P o tnU-P fd U P tn rd p> _p o rd O tn -p 

rdOO-POUOtnOUrdtnOrdOtntntr>Otnuotn 
-POtntnuooofdUOtntnotnUtntnoOfd-Pfd 
P>Otntn4-JUfdP>rdU-Ptn4^utnurd-Prd-Ptnrdtn 
tn P rd-P-P tn tn -p tn tn tn tn O rd O tn tn tn tn tn p tn o 
tn rd tn tn -p U tn tn U U tn u tn O O a O tn O tn o fd O 
O tn tn tn tn -P -P O o rd tn O tn U P> fd tnU-P O o rd tn 
. O O tn tn tn tn tn u tn O u fd OP tn O tn Q tn tn p tn U 

tnOOUtnUtnuutnUOtnUUUtntntntnOtnrd 
Ot^UOO-Ptn-P-POtnrd-PPOrd-P-PrdrdOtnrd 
O-P-P rd -P O O-P tnu tnu O tn rd tnu tn tn tn o U-P 
-PP U U tn tn o U tn tn u rd . -P tn o tn tn tn tn tn tn o tn 
tn tn a tnu tn rd -P -P U U rd tn -P tn-P u UP U fd tn o 
-P O UU tnu U tn tn p tn rd rd tn tn O U U tn tn tn -p u 

fdtnrdtnUtntntnUUUtnutnUUtnUUtnutnu 
tn tn U u tn -p o U rd tnu rd O tn rd O tn rd tn .p u tntn 
-PP U U tn tn rd tn tn o rd O rd tn tn tn P tn rd rd P O tn 
U U U tnU tn tn tn U U U tn tn tn O U U tnu tn tn u U 

UtnrdOP^tnrdtnrdOPtnOtnrdOUrdtnootntn 

fdfdUUUUtnutnPtnfdtnUUrdP>tnUfdUtntn i 

4^tntntntntnUUtntntnutnutnUP»Utnuuou h ^ 

Utnutn^-prdUrdPtnrdtnrdtn-Ptntnp-PUrd-P 

4-J U-p tn tn tn -P tntnU rd tno tnP tno tnO tn tn tn U ^2 

tnUtnutntnuuuuuuuutnutnuutnuutn 
UP tnU-P U rd rd tn p tn rd -P rd-P-P tn U rd U U -P fd 
-P rd tntntnU tn tn rd tn tn p> rd tnrd-P tn rd tn rd tn tn U 

Utnup>tnUUtnOUUUtnUtnUtnUtnUUUP> 
UtnuU-PP>UUPU-PP>UUUUUrdUUUUtn 
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U U tnu U rd-P-P tn p rd tn tn U P tn tn tn U tn tn tn rd 
UUtnUUtntntnOrdOtntnrdOOtnUUUUUU 
fd U U tn tn u U tn tn tn tn tn U U tnu tnu U tnu U U 

UUtnUtnUUPtnPUrdUtnUU-PUrdtntnrdO 
fd rd rd rd tn tn tn tn P u fd tn tn tn U tnp> tn tn -p u U tn 
tnu tn-P U tn-P tn tn U U U tn U fd U tn U tn tn U tnu 
U tnu U tn P tn rd tn rd fd U U U fd U-P-P-P tnu tn P> 
fd -P u fd-P U tn tn tn tn rd U tn tn U tn U tn rd tnu U O 

tntn-PtntnUUtnrdtnUU-PUUUUUUUUrdU 
tn tn tn U U UU U U tn tn rd U U tnrd-P tn rd U O P> P 

fdU-PUtntn-PtnpUfdUfdtntntn-prdUfdtnuu 

tntnrdUtnUUUtnUUUUUUUUUtnUtnou 
tntnutntnfdU-PU-PtnrdUUUUfdtnU-P-p-p-p 

Ptn-PPUUUUrdtntntntnUrdrduotntn-Ptn-p 
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UfdUUtnU-PUtn-PUfdfd-P-P-PtnuUUU-Pfd 
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PtnUrdUUUrdtnuutnt^tnUUrdUPtntntntn 
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O tn tn rd rd -P O-P tn O U rd rd O tn tn O U tn O tn tn O 
rd tn tn o O O rd tn tn tn -P tn o -P O rd tn o -P tn tn rd O 
tn tn O rd U O O-P rd 4-> tn O -P tn tn O tn-P O tn O tn p 
rdrdOrdtn-POrdOOO-PtnOOO-P O-POOOrd 
O tn-P tn tn O O b tn O tn tn P tn tn tn tn rd rd -P tn P 
OOP0 04^P>OOOrdP>OrdtntnP>OtntntnOtn 
tn rd tn tn rd O +-> -P 4-> -P O tn P> tn O O O O rd -P -P tn rd 
tn tn tn tn tn tn O tn O O U tn tn O -P tn rd O tn rd -P O tn 
rdOOrdtnOOtn4^4^rdOOOtnOOrdtntntnOO 
O-P tn O O tn rd O P> fd rd tn -P -P tn U U O tn O rd O 

U O U tn U tn tn tn O tn tn tn O O tn rd fd rd -P rd rd tn 

cdtnrd tn-P tn O rji tn O tn -P P> rd rd 4->. O rd O tn tn o tn 

tn O tn tn tn -P o tn tn tn tn tn tn O O tn-P rd tn -p U -P rd 

tn rd -P tn tn tn rd O tn O tn P> O tn P> tn O tn tn tn tn tn O 

-P 4-> tn tn O -P tn rd -P -P rd tn p tn tn U tn tn U tn o U 

O tn O tn tn tn tn p tn-P O tn O tn tn tn -P rd -P O rd O O 

rd O rd tn o tn O tn tn tn O rd tn p O tn tn tn tn tn o tn tn 
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tn O O O O tn tn O tn O tn tn O tn tn tn O O p> o rd-P rd . 

tnOOtntnp)OrdtnOOOtnPOtnrdrdOrdOOtn 
O-P tn -M rd-POOO rdOOOOrdO rd tnrdOO-PP 
tn tn o tn O tn tn O tn tn -P tn O tn O -P tn tn o tn -p rd tn 
tn rd tn O O tn rd o tn p> o rd tn tn a o tn o o tn tn tn tn 
tnP>-PO-Prdtn-Prd-POOrdOOOtnO OOrdOrd 
tntno tntnO tn tn tn tn rd O OPP cd-PP>-P O rd rd tn 
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OOOOrdrdtn-PO-POOrdrdtnOOO rd-POOP 
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rdO0rdrdO4-)rdrdO'Ord-POtntnOUOrdtn-Ptn S 
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tntno tntnP tnrd O O O rd tntnO tn O tntno tntno |S - 
rdOtnrdtnrdrdO-POP>PP>rdOO-POtnOPtntn c*3 
O tn tn tn tn O O O tno O tntntnO -P -P rd tnrd OP 
O tn tn P OP O tntntnOP O O O O tntnO tno tnu u - 
OrdPO tn rd tn p -P rd-P-PtnotntnOO tn -P -P rdO 
tntno tn tn tn o O tn tn O O tn tn tn rd tntno tn p tn p 
OOOOOrdOO-prd4->OtnPOOtntnrdtntntnU 
tno rdO tntno tnrd tntnOrdtnOOO O O O-P U fd 

rd tno tntntntntntntntntntntnOP>P>-P o tnrd o rd 
OOtnO tn tn rd O O OOtnOOrdOO tn rd O tn O tn 
rd tn rd rd -P P O tn P> tn rd -P -P rd tnO-P-P tn tn tn tn tn 
tno tn P> O tno tntntntntno O tn tn -p tn -P rd rd rd fd 
P o tntno tntno-p tn tn P tn P o O O tno O o tn tn 
cdp>rdrd tn P P P rd O rd tn O P tntntntntntntno tn 
tn tn tn O O O tn-P tn -P O tn O tn tn rd rd rd O -P rd-P o 
O tn tn O rd rd P tnrd tnOOOrdtntnO tn tn tn O rd O 
O tn tn P -P o rd-P-P-P tn P O rd O O tn rd P tn tn tn P 
tno tno tn tn tn rd tn tn O O tn tn O . tn O O tnp) tntntn 

tno O-P O tn O tntn-PP>00-POtnO O tntno OP 

tn tn tn P O rd tn rd P rd tn -P rd O rd tn tn tn o O o o tn 

tn tn tn O o O tno O tntn-P tn rd p> -P -P -P tno O tnrd 

tno O rd O rd O O-P O P> tn-P O tn tn O O tn tn tn O O 
OPrdrdrd-PtnrdtnrdP>rdPOOtnOUOOrdrdO 

tn O P o O tn tn tn o tn O tn tn P tn-P-p tn p rd o tn tn 
-POtnOO-POOtnOOOrdtnOOO rdO-PrdOtn 

tnrd o tn o P tnO-P tnrd rd o O rd tno tn tn tn o OP 
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ftfctf tn O O -P O O -P O O rd -P -P -p P> rd O O O -P O O 

tn rd rd tn O tn-P O tn O tn rd rd tn -P U U4-> O o tn tn tn 

tn&&fciD*d*bv&itj*^tj*Gnt^cn&&U-tnfa tn tn tn -P 
OtnOO-PrdO-P-P rd-PtnrdOOO-POOtnOO-P 
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tn tn tn tn tn tn -P tn tn o rd O O tn tn O OU OO O tn p> 

O tn rd O O O tn O. O OtnO-PrdOOtnOtnrd O tn tn 

cd rd-P-P-P tn rd -P P> -p -P U tn O -P tn D> ^ tn tn 

OO tntntPt^^O tntJ^U t^tPO O-P O tn o O tn tn tn 

tn o -P tn tji U O O tn tn tn tn -P tn o tn -P -P -P o o -P o 
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OO tn o tn U tn U O tn tn tn O tn tn rd tn tn tn O o tn -P 
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-P tn-P tn tn tn -P tn -P O-P rd -P tn -P -P tn tn tn -P rd O tn 
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rd O OO O O rd -p rd tn tn tn rd tn O -P O rd-P-P O rd O 
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O O rd tn tn tn o O-P tn tn tn O O tn tn tn tn tn O O O O 
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rd tn -P tn O tnrd-P O-P tn-P tn tn -P rd-P rd rd O rdrd+J 
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4^0tnOtnrdtntntntnrd4^rdPtnOrdmotnootn 
(dOO^rdP>tnOOrdtntntnOOtnrdrdOOrdtnrd ,j 
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P tn P tnrd tn tn tn O rd rdP-P O O rd tn O P rd O tn o 

tn tn o P rrjp) tn P P tn tn O tn P rd tn tn rd O OP tn tn 

tnrd O OPP> tn O rd O rd O tn rd tn tn tn tn p p p o O 

rd O rd O tn U rd tn tn O O tn O tn O tn tn o rd tn p rd O 

tn O tn tn tn 4_) OP tn O tn tn p) O tn rd -P P> tn tn rd tn tn 

O tn O -P P tn O tn O O O P tn O tn O rd O O.O O tn O 

rd O rd O tn P> O tn tn tn P> tn tn O Ord tn O O tn tn rd P 

tn tn tn tn p rd O tn tn o tn tn tn tn O rd O O O P tn o o 

OP OOtnOO O tn rd tn O rdO-Prd-P rd tn tn rd tn tn 

tn O tn tn P rd O tnrd tn -P P OOP>0 O. o P> P> rrj OO 

tn tn tn tn tn tn P rd rd O fd tn O rd O O -P O tn O tn tn tn 

O OO Otn-pP tn rd P> tnO rdO fdP tntnO OO O O 

tn tn P O tn tn tn O tn tn O O tn O O P rd rd rd OP tn O 

O tn tn O rd -p P P tn tn tn rd O tn rd P tn O tn O O O tn 

tn O P P tn tn p P tn tn tn O O tn tn O rd P> P tn P rd O 

rd tn tn rd rrj rd P P tn O O tn tn O O tn rd P P tnrd O O 

O tn tn tn tn rd rd O O O P O P tn p> o tn o tn P> p rd rd 

tn p o O tn tn O P P P> P O tn tn rd tn O P tn tn rd tn tn 

O tn P P O rd O OtnrdP>tnOOtnO tn P P O O rd P 

O O tn O P tn P OP tn -P tn O O P> -P rd P tn O tn tn O 

t — It — It — It — I tH t — I t — I rH c— I i — It — It — It — I;— It — It — It — It — It — It — I t— { t — It — I 
OVDNOO^OU)CsIOO^O^C^CD^OVD(NOD^OVDCN] 

HHc^](N^^^lOLO^^^(X)ooa^ooHHc^lcn( , n'^^ l 

HHHHHHHHHHHHHHH(N1CVJC^CN1CN](N10J(N] 
iHTHrHMtH^tHTHTHrHvHiHrHtHrHiHrHrHrHiHiHrH 



CD 



BNSDOCID: <WO 01272B4A3_IA> 



SUBSTITUTE SHEET (RULE 26) 



WO 01/27284 PCT/US00/27433 



40/70 

OOtnOtnOO tnOOOrdOtnOOtnOOtnOtnO 

O tn rd O rd O -P tn -p rdO rdO-POOOO O O -P rd O 

O-P ^^t^^O tPtnu O tn rd O rd rd U U tn tn o tn tn 

tn O O tn -P o tn O tn tn O ^ tn O tn tn o O tn tn O 

tn O O-P tnO O O tno O tnu+J o rd -P O -p -p -p rtf .p 

4->tJ*oo tn tn tn tn o tn tn o tn tn O -P U tn o O rd tn tn 

tn tn rd (d o O tn tn tn tn o O tn tn tn tn o o tno O tn o 

cdtnOO-P O-PO tnO-POrdO-POrdOO-POrdO 

O O-P-P rd tn tn o O tn -p rd tno tn tn o tn tn tn tn o O 
OtnOOOtntntnrdtntnOOOOtnOtnotnOtnrd 

0-PrdOr3tn04^0-Ptnn3rdrd-PrdtntnoO-POrd 
tn rd tn rd rdO tn tn tn rd -P tn tn O rd O tn -P tn _p rdO tn 
O tn tn tn o tn tn o O tn O tn O tn tn tn o O tn O O O O 

^O-PrdrdOO-P-POO-P-POOOOrdOrdrdtnrd 
5? 5? O O O tn tn tn rd -P rd O-P O tn tn o tn rd tno tn tn 
tn tn O tnrd O tno O tn tn tn O O O tno O tno tno O 
rd O -P -P tn O -P rd O-P-P tntntno-P O-P rd -P .p tn o 
O tntntntntnrdtno tno O O rd tn tn rd -P tno O O O 
-PO tntno O O O tno tn tn tn tn tn O tn tn o O-P tnrd 
tnOrdO-POrdrdrdrdOtn-POOO-PrdOtnrd-Prd 

tn tn tn tn O tntntntncd tn tn tn -P o tn tn tn tn tn tn O tn 

O O O O rd tn -P O O O tn O tntno O tn tn o tn tn tn O 

O -P O-P-P rd tnrd O-P O tn O rd O -P tno tn p tn .p Q 

O-P rd-P tn tn tn tn O tn -P tn O tn -p o tn rd tn O O O tn 

tno tno O O tn tn -p o u O O-O tntnotnord tn tn tn 

"^ii 4 ^^ 0+J tntno+-> tnO-P-P tn +J OO-P rd tn O rd 

O tn -p u tn tn O -P tn tn O rd O rd tno O -P tn tn tn tn tn 

OOtntnOOOtnotntnOtnooooootnotntn 
-PO O O rd rd tn tn tn rd -p -P tn -P -P -p rd O-P tno rd O 
tnO-P tn tn tn tn O tnO rd O O-P tn-p-P O rd O rd tnrd 

OtntnoOOtntnoooOOtntntnOrdtnotnotn £2 

-POtnOO-P-PrdOOtJ^tn-POrdtnOtnrd-POOO « 
-P4->-P O O tn tn tn rd tn tn tn -P rd tno tntntnp tn rd O ^ 

tntnOtnrdOtnOOOtnO-POOtnOtntnotnuo 
4JrdrdrdtnO-P-P-PtnrdOtntnrdtnO rd-P O O-P rd' S 
^tntnootnOtnOOtntnrdtntnOtntnorOrdOO ^ 
tnOOOtnOrdtntnrdOOOOOtnOOtnoOO rd 
4->0O-P-Ptn0tn-P4-»OrdtnO-P-POrd4-)0OOO 
tn tn rd tn tn tn O o O tn tn -P O O O O rd tno tntntntn 

tnootntnOOOtnooOtntnO-PtnotntnOOtn 

O _p 4_) +j tntn-P O-P tn -P rd O rd O tn tn O tn tn -P rd O 

OOtntntn-Ptntntntntn-POOrdtn-prdOOOOrd 
rd tn tn tn O O tnrd tn tn O O tn tn tn O O tntno O tn tn 
O-P rd rd-P-P-PO O rd o O tn -P rd tnrdrd OOtnO rd 
tn o tn tn -P tnrd tn -P o tnO-P O O tn tn tn tn tn o O tn 
tnrdOOtntntntntnrdOtnOrdtntnOOOOOOtn 

Otnrd-POaJrd-PtnOrd-Prdrdrdrd-P-PtnrdOOrd 

rd O tn tn -P tnt^O O tn tn tn O tnO O tntno tnrd rd tn 

OO O O tno O tn O tn O O O tn tn O rd O tn O tn tn -P 

O O-P tntno tnrd O O-P rdrd-P-P rd O O -P p tno tn 

O rd-P O O tnO tnO tn tn tn tn O O tnrd tno tno O tn 

rd tn o O tno OO-P tntnOO OO O tno rdtntno tn 

^ O O tn tn O tno tn -p rd . O O -P O tn-p-P-P rd-P rd tn 

O tn tn O -P tn rd tntno O-P tn tn -P o tn tn tn tn o O O 

O tn tn tn tn O tntno tn tn O O tno rd O O O tntno tn 

04Jtn-P4^0tnOrdrdOrdtnOOrdtnOrdtnrdrdtn 
tnrd o O O tn -P O rd tntntntntntnotntno O O tn tn 
O O tn O O tntno O O tno O -P tn tn -P tno tntno tn 
-Ptnrd rd O-P-P O-P O O tn -P tnO-P tn O rd tn -P rd O 
tn tn tn O tno O-P tn tn -P tno O tno tn -P tn tn tn -p o 

* ' 1 I * I * I 1 I * I * I t I t I T I \ I \ \ \ It | t I \ | I | t | ^ 1 i j j J i } , | 

OO^OVDC^OO^O^CN300^0VDC^OO^O^OvJOO^O 

CN<NCN10J(N(>lCN3CsJCNlOO(^C^rv^cr)(^ 

\ — I \ — I i I \ — 1 \ — ! x — I \ — I t — | \ — It — |( — It — \ i — | , — 1 , — | T — J , — j ^ — j , — j T — j 



BNSDOCID: <WO_0 127234 A3 IA> 



SUBSTITUTE SHEET (RULE 26) 



WO 01/27284 



PCT/USOO/27433 



41/70 



tn 


tn 


o 


fd 


o 


u 


tn 


tn 


u 


tn 


a 


tn 


O 


tn 


u 


o 


O 


tn 


o 


u 


tn 


tn 


fd 


rd 


-P 


tn 


O 


tn 


-p 


4J 


tn 


tn 


o 


u 


4-> 


tn 


fd 


-P 


o 


4-> 


fd 


o 


tn 


fd 


4-> 


-P 


tn 


U 


tn 


tn 


U 


fd 


fd 


tn -p 


o 


4-> 


O 


tn 


a 


tn 


4-> 


fd 


U 


4-3 


O 


tn 


tn 


o 


tn 


U 


-P 


tn 


fd 


tn 


tn 


tn 


tn 


tn 


tn 


a 


tn 


a 


a 


o 


U 


tn 


tn 


u 


tn 


tn 


o 


tn 


rd 


tn 


fd 


fd 




4-> 


fd 


u 


4-> 


tn 4-> 


tn 


o 


fd 


4-> 


rd 


4-> 


tn 


u 


fd 


tn 


rd 


4J 


tn 


tn 


tn 


tn 


u 


o 


tn 


tn 


fd 


O 


u 


fd 


tn 


tn 


a 


fd 


fd 


4-> 


u 


tn 4-) 


O 


U 


Cn 


4-> 


tn 


tn 


-p 


O 


tn 


tn 


tn 


fd 


tn 




tn 


O 


tn 


U 


cn 


U 


tn 


U 




tn 


-P 


tn 


O 


tn -P 


tn 


fd 


4-> 


4-> 


4-> 


tn 4-> 


4-3 


tn 4-> 


O 


O 


a 


fd 


tn 4-3 




tn 


-P 


4-> 


tn 


u 


O 


tn p> 


u 


U 


tn 


tn 


fd 


cd 


U 


tn 


tn 4-» 


tn 


tn 


U 


fd 




tn 


tn 


<d 


o 


fd 


tn 


u 


o 


a 


o 


tn 


O 


tn. 


tn 


O 


U 


tn 


tn 


tn 


O 


tn 


o 




o 




tn 


tn 


u 


-P 


4-> 




tn 


fd 


O 


fd 


tn 


-P 


tn 


O 


tn 


fd 


4-> 


4-3 


' \J 


~t — ' 


4-> 


o 


tn 


tn 


o 


u 


tn 


tn 


tn 


tn 


u 


fd 


tn 


tn 


tn 


tn 


fd 


U 


rd 


4-> 


fd 




(Tt 




tn 


tn 


tn 


tn 


tn 


tn 


o 


tn 


O 


tn 


tn 


a 


fd 


U 


O 


a 


fd 


u 


tn 


tn 


r \ 


<j ■ 




tn 


fd 


tn 


rd 


o 


fd 


-p 


fd 


4-> 


4-> 


4-> 


a 


U 


a 


fd 


fd 


O 


4-> 


4-> 


fd 




r \ 
vJ 


-4 — ' 


4_) 

■ 


tn 


o 


o 


-p 


tn 


4-3 




a 


tn 


tn 


o 


tn 


tn 


fd 


o 


tn 


tn 


tn 


tn 




<y ' 




o 


tn 


O 


o 


tn 


tn 


tn 


O 


u 


o 


O 


u 


O 


U 


tn 


o 


tn 


tn 


tn 


tn 


T \ 


r \ 

V-/ 




tn 


u 


CD 


fd 


-P 


tn 


4J 


4-> 


o 


o 


tn 


o 


tn 


4-> 


O 


u 


fd 


U 


4-3 


u 


r \ 


ni 


r \ 


tn 


o 


tn 


O 


U 


-P 


tn 


tn 


fd 


tn 


tn 


tn 


tn 


O 


O 


o 


tn 4-3 


o 


o 




r \ 


r \ 

\~s 


o 


rd 


O 


tn 


tn 


tn 


tn 


O 


a 


n 


4-> 


o 


fd 


tn 


O 


tn 


O 


U 


fd 


fd 


a 


rd 


r \ 


tn 


u 


o 


o 


tn 


U 


tn 




fd 


4-3 


tn 4-> 


fd 


4-3 


a 


O 


4-3 


tn 


fd 


U 


o 


O 


tn 


o 


4-> 


rd 


tn 


tn -P 


tn 


-P 


tn 


tn 


a 


tn 


tn 


tn 


fd 


rd 


tn 


fd 


\\ 


r ) 




tn 


tn 


u 


o 


tn 


O 


tn 


U 


tn 


o 


O 


O 


o 


U 


tn 


O 


tn 


tn 


U 


a 


tn 




r \ 


tn 


tn 


o 


o 


O 


-P 


-P 


U 


fd 


o 


4-3 


U 


o 


O 


a 


o 


4-3 


O 


4-) 


4-> 


o 






i \ 

H — ' 


u 


rd 


o 


fd 


4-> 


O 


fd 


tn 


tn 


O 


tn 4-3 


tn 


O 




O 


O 


fd 


fd 


o 


i \ 
-i — 


(Tt 




tn 


o 


tn 


O 


O 


tn 


tn 4-> 


a 


tn 


a 


a 


tn 


U 


o 


tn 4-> 


tn 


tn 


tn 




( ) 


tn 


tn 


-p 


-p 


tn -P 


fd 


tn 


tn 4-> 


O 


4-3 


fd 


4-3 


fd 


4-> 


tn 


fd 


tn 4-3 


fd 








tn 


y .... 

tn 


o 


tn -P 


tn 


a 


tn 


& 


4-> 


o 


fd 


fd 


u 


a 


tn 


tn 


U 


u 


tn 


tn 


rri 




o 


O 


tn 


in 


O 


tn 


U 


tn 


tn 


tn 


tn 


tn 


O 


u 


fd 


O 


4-> 


tn 4-J 


U 






T) 


o 


tn 


tn 


O 


O 


-p 


O 


tn 


tn 


O 


O 


4-> 


tn 


tn 


tn 


O 


tn 4-3 


tn 


fd 


O 


tn 




a 


tn 


tn 


-P 


tn .p 


4-3 


tn 


a 


4-> 


tn 


rd 


tn 


tn 


tn 4-> 


tn 


O 


tn 


tn 


tn 


tn 


tn 


tn 


O 


O 


tn -P 


tn 


U 


tn -P 


tn 


tn 


tn 


U 


u 


O 


tn 


O 


tn 


a 


O 


o 


n 


U 


a 


tn 


tn 


tn 


a 


u 


O 


fd 


tn 


U 


4-> 


tn 4-> 


fd 


rd 


fd 


4-3 


O 


o 


u 


o 


tn 


u 


tn 


tn 


tn 


U 


tn 


fd 


fd 


tn. 


tn 


-P 


fd 


O 


4-> 


O 


fd 


tn 


a 


O 


tn 


tn 


o 


rd 


u 


u 


O 


O 


O 


O 


u 


tn 


O 


a 


O 


tn 


4-J 


tn 


U 


u 


O 


O 


U 


U 


O 


tn 


o 


rd 


tn 


O 


-P 


O 


fd 


-p 


tn 


a 


o 


tn 


a 


tn 


tn 


tn 


tn 


o 


tn 


fd 


4-> 


fd 


fd 


4-> 


tn 


4-»' 


O 


tn 


fd 


tn 


fd 


U 


tn -p 


4-> 


fd 


rd 


tn 


a 


fd 


tn 


O 


u 


tn 


O 


tn 


rd 


a 


4-> 


Cn 


VJ 


tn 


U 


tn 


tn 


tn 


o 


U 


tn 


o 


U 


r \ 
{J 


tn 


O 


a 


tn -P 


cn 


tn 


u 


o 


tn 


-P 


U 


U 


-P 


tn 


4_) 


4-> 


o 


O 


a 


4-3 


fd 


tn 


a 


4-) 


o 


fd 


tn 


-p 


rd 


tn 


u 


o 


o 


tn 


U 


-P 


U 




u 


tn 


tn -P 


4-> 


tn 


O 


tn 


O 


fd 


tn 


tn 


u 


u 


tn 


fd 


tn 


tn 


U 


O 


u 


tn 




tn 


u 


O 


tn 


o 


U 


O 


a 


U 


a 


4J 


u 


U 


T \ 


r ^ 
U 


a 


tn 


tn 


rd 


fd 


u 


fd 


u 


tn 4-> 


U 


fd 


fd 


U 


O 


tn 


a 


tn 


tn 


a 


U 


4-> 


4-> 


rrt 

f0 


fd 


O 


tn 


tn 4-) 


O 


4-> 


U 


tn 


rd 


tn 


tn 


tn 


tn 


tn 


tn 


tn 


U 


tn 


fd 


a 


4-> 


tn 


tn 


U 


tn 


tn 


u 


tn 


tn 


U 


O 


u 


O 


tn 


U 


U 


O 


tn 


o 


U 


u 


o 


tn 


tn 


U 


U 


tn 


tn 


U 


u 


fd 


U 


U 


tn 


fd 


tn 4-3 


U 


fd 


fd 


fd 


o 


fd 


u 


tn 


4-> 


u 


rd 


tn 


-P 


O 


O 


fd 


O 


fd 


O 


fd 




tn 


tn 


tn 


tn 


tn 


o 


fd 


tn 


tn 


O 


o 




tn 


tn 


tn 


tn 


O 


u 


tn 




U 


U 


O 


a 


a 


U 


tn 


a 


fd 


o 


-p 


U 


U 


O 


a 


4-3 


U 


n 


tn 


fd 


o 


O 


O 


tn 


o 


4-> 


tn 


4-> 


O 


r ) 


o 


tn 


tn 


tn 


a 




tn 


o 


tn 




tn 


tn 


o 


tn 


a 


a 


tn 


fd 


tn 


O 


O 


tn 


4-> 


4-> 


O 


tn 


tn 


tn 


U 


fd 


4J 


tn 


fd 


tn 


tn 


o 


O 


tn 


a 


O 


a 


U 


tn 


tn 


tn 


O 


a 


a 


O 


O 


tn 


tn 


o 


tn 


O 


a 


-P 


rd 


4-> 


tn 


fd 


o 


fd 


u 


tn 


tn 4-> 


U 


-P 


fd 


4-> 


fd 


tn 


fd 


4-> 


4-» 


4-3 


U 


tn 


rd 


tn 


-p 


tn 


tn 


o 


4-> 


fd 


fd 


O 


fd 


4-) 


U 


fd 


fd 


o 


tn 


O 


U 


fd 


tn 


rd 


rd 


U 


O 


tn 


u 


a 


o 


tn 


a 


U 


tn 


O 


U 


tn 


o 


tn 


o 


U 


U 


O 


tn 


tn 


tn 


tn 


O 


o 


-p 


a 


-p 


4-> 


fd 


a 


4-3 


4-3 


tn 4-3 


tn 


4-> 


-P 


o 


fd 


u 


4-> 


O 


tn 


O 


a 


tn 


tn 


o 


fd 


tn 


rd 


tn 


rd 


rd 


D 


tn 4-> 


O 


tn 


tn 


rd 


4-> 


fd 


tn 


tn 


tn 


u 


tn 


O 


tn 


tn 


a 


O 


tp 


tn 


tn 


tn 


tn 


tn 


tn 


fd 


4-> 


tn 


tn 


tn 


o 


4-3 


o 


O 


tn 


tn 


4-> 


-p 


O 


-p 


u 




fd 


tn 


O 


O 


U 


O 


fd 


a 


tn 


fd 


U 


fd 


tn 


tn 


U 


U 


o 


tn 


O 


-P 


a 


tn 




O 


tn 


tn 4-> 


O 


tn 


tn 


tn 


O 


tn 


tn 


o 


tn 


tn 


tn 


a 


4-3 


tn 


tn 


O 


a 


tn 


U 


O 


U 


tn 


tn 


fd 


tn 


tn 


O 


a 


tn 


U 


tn 


O 


tn 


tn 


tn 


o 


-P 


-P 


fd 


-p 


-P 


U 


U 


O 


O 


fd 


o 


CCS 


4-> 


u 


-p 


4-> 


fd 


tn 


O 


u 


tn 


u 


4-> 


tn 


O 


tn -p 


U 


tn 


O 


fd 


O 


O 


fd 


fd 


O 


rd 


tn 


tn 


tn 


tn 


tn 


4-> 


O 


tn 


O 


iH 


rH 


T 1 


rH 


rH 


rH' 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


\" - { 


rH 


rH 


rH 


rH 


VD 


CM 


00 




o 




CM 


CO 




o 


VD 




OO 




o 


VD 


CM 


OO 




o 


VD 


CM 


CO 


CO 


(TV 


cn 


o 


rH 


rH 






CO 




^ 


LO 


lO 


VD 




r- 


00 


00 




o 


o 


\ — 1 


rH 


00 


CO 










^ 


^ 
























LO 


LO 


LO 


LO 


i— 1 


t — 1 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


rH 


t — 1 


rH 



hi. 



BNSDOCID: <WO 01 27284 A3JA> 



SUBSTITUTE SHEET (RULE 26) 



WO 01/27284 



PCT/US00/27433 



42/70 

O O tzntntno t^U tr> U U U tr> rd O U • tr> tn tn -U u 

O-P rd P O fd U U U -P rd O rd P -P O 4-J O tr> rd O 

U U U O tn -P u P tn fcn U O t^tnOP tt\ u fd tr> tn 

-P U rd rd O P rd U O U P O tr> o O P rd p -p tn o & 
O fd U tn tn O fd O tJ> tr> O U tn tn tr» U -P tr> o fd O 
-PtnUOt^Ut^fdtntn. UUCnuuotnOOtr>Utn-P 
2? ^ U P P UP tn O P P tn U P U rd U P O t^Q o Tr> 
tn O tn tn P fd tn tn tn rd fd U tn -P u tn o tn tn O tn O tn 
O tn u rd tn.tn tn U U tn o o tn tn O O P up tn U o 

O-PtnOOtnO-PrdOrdrdrdfdrdOrdOtntnOrdO 
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O O O tr> tr» O -P tD rd tntntno O O tno tntntntno tn 

.p tn4-> U O & O rd tn tr> tn tn p> O tn rd o p> o n tnn - 

tn O O O O O O tn fd O U U U tri O U rji tnu & O U 

-P tri _p rd P> fdtnrdrd fcn O U Ot^cd-P Cn .p O rd Cnu & 

tr> £n O rd rd O tnUtntnt^O-PtntntnO U O .p & rd 

tr> O tn O Cn tn u U O tn tn rd tn tn o tn tn tn rd tn tn o O 

£n -P tn -P P> O rd tn rd o tn O -P rd rd rd tn fd O -P -P o & 

tn .p O U tn -P tn o -P U tn tn tn O rj> tn cji ^ n n tn m 

fd o rd a tn tn o oo o tn tn tn tn rd o o tn tn rd tn o o 

U O fd O rd rd OP>tnotnrd-P-P tn tn O O P> tn tn P> tn 

fd tn u a O rjn u O tn tn tn tn o tn tn tn tn O tn tn tn tn tn 

Ofd-P U tn -P rd rd 4-> tn O rd-P-P O-p O tn O o O rd _p 

rd rd U p o tn tn rd O-P U tn Q Q tn O rd tn tn O tr> tn 

rji O rjn U tn tn rd U tn O tntntntntntntnrd tn tn tn 

OU4-> tn rd O-POO rd tn -P O rd tn tn rd P> rd O i! hiij 

tn -P O tn O tn O -P tn tn O tn O tn O tn tn rd tn tn -P tn tr> 

S ^ ti H O tn tn tn tn O O OO O tn O rji U tn tn tn tn . 

-P tn -P tn O rd -P fd -P P rd-P-P O O tn O tn rd tn O rd P 

fd rd tn O tn tn O fd O tn tn rd tn tn rd O ^ rji rjn jj q n 

OOO O O tn O tn tn O -P tn tn O Di rji U O U tn tn tn tn 

-P tn p> p> O -P O -P -P fd tn 4-3 O-P-P-P-P-P t» O -P Q tn 

tn tn O rd rd tn tn tn tn O tn tn o tn tn tn tn tn -P O tn .p rd 

tn O O O tn O O tn tn tn tn O O tn O tn tn tn O tn tn O tn 

O fd O -P tn rd rd 04-> O tn rd tn rd OO tn tn O fd-P fd O 

tn tn -P tn tn tn rd 4-> rd O O O tn O rji tn rji p tn tn rd O tn 

f> ° ^ ° O O tn tn O rji U O O Di tn O tn Q & & & tn 

Si ^ ti ^ m ° ° ^ ^ O tn O P> O -P rd O rd tn rd P> D^O 

tn tn tn tn rd rjn rjn aj rd O tn tn tn O tn rd O O O rji rji & 

tn O tn rji tn rjn Q O tn O O tn O O OO U rji rji () IT 
tn O tn P O O rd P> P U tn rd rd O tn O P O-P rji O |J rd . 

tn p rd-p O OP Dip rd O O tn rd tn tn O tn O 4-J -P .p o 2 

tn -p tn b> tn rd rd -P O tn O rd tn O OO aj tn <rj O-P tn 

H ^ 5^ O rjnrjnrjirj\UtPrj)UU Cn u O Cn o tr» rj 

ti O O fdO-POP^tnrdCntntnO O rd rd rd-P-P O-P 

t^tn-P tJi rji O U O tntno D>tn^&rd tno rjn u rji n 

OtnOOOO OrjnoOOOOD^O^r^rdOO&pJO 

-Pi^_p rd-p t^fd O O P tn +J rjn p» o rd rji u rd-P O O 
tpO-P OO D^tnord tn rjn rd tT> O tTi p> cjn rjn tj> tn r^ rjn xj 

Ofd tn Cn rjn tn Cr» rd P rd tn p OP fd O tnO-P O O O-P 
tr>0 tr> tr> -P tr> O tn O O rd rd O tn rji rj^ u -P -P o O O 

OOOOOOOOOt^OOODitnc^OOCno&o^ 
-Prcn-P rd rd fd-P rdtn-PO tnCnO O -P tn-P O P> fdO O 
OO-Pfd rd rd O t^t^Ord Cr»otnCnCnO-P rjn tJ^ O rd 
P»Ob^O O fd tnt^er>o tno D> tr> tn tn O tn tn o P 
tntntnrjnr^CnO OO OO-P rd04-J-P O-P O OP>P> 

fOrd tnrd rd O rd O tJ> rd tP-P tj»tT»0 tnO O Op) tntntn 
D^O O fd O rd OP b^O rjr^tno t^r^tnrd O fd tn&tn&» 

OprdrdO-POrdOOPOOOD^OrdPOtn-ptnO 
-P-P tn O rd tntntPO rd ccn rjn 4-J Cn^j tntntn 

O^O OO tntno tntnOO tnOtnO&OO tr> O OO 

S.^^o&Si^^ 0 otr>o o o -P o p) -p o rd rd rd -p 

OO O tn O tnO O tnCnfd tn tJi U' rji rjn Q o rji o n 
S'tiy.^L^^ ° ^ rd P> 4-> P tr> tn rd b u -P rd rd O fd 
cntn&tntnOfd tno OtntrkOO tno tn^tno tn+J O 
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tnO tno tntnO O tn tn O O O O tn tr> o O u U tntn 

O in fd Dvty> tn tn O tn O O tr> rd tn .p u tntnCntnu 

O rd u»U>ditntn^04-> O -P o tr> o tn tr» O O U O O-P 
atntPfdOa-POOtrtrdfdtnua-POa-PrOOtntn 

tn U O Cn tn -P tr> -P tn -P tntnt7»n3 tn tn rd b> tr> tn tr> tn 

tn U tj> tn O tr^ tr> O tno U tn tn -P t^U O U O 

tnm tn u -P tn o p> u in fd o tj» u -P u u a tr^tno tn 

-P O tr> tn tr> O tr> O Ot^UO^Ot^t^U UO Ot^b^ 

tr> -P tT> O tr> -P O-PO tPOO-POOOO-P O O P> O -P 
Cntnu-P o U rd rd -P U -P u tn -P -P tn {n rd tn o tn 
O O rd U O O tn tn O O tr> O (U U cji tr> U U U t^tn 
4JOtnfdOrd-P-Ptrifd-PO-P-PO-P-PO-POfdOO 
O fcn tr> O O tn rd tPt^O-P tntn-P aj O O tn -P UO 
tn O tn tyi tr> U tJ» tn a O tn o U t^tntnO tn -P o tnt^-P 
OOtnOnjOO4->4->OOOtyitn-P4JO0jtr*Ucd-P^ 
tntn-P-P tJifd tntr>tr»tT>U U t7> tx> in rd tn tn O tr> O fcn 

4-> -p rd U-P tn fd O-P O-PO rdrdOtXi-Pfdrd-P tn O O . 

OO tnO fcn U O-PO OtnO ^ O O tn O tnO O tx> 
OOtJ>fdrdfdOnci+->OOOtn-PfdCn4^0+->fdCnOO 

tn +j o -P tn tn U in tr> fd O tn U O aj O tn tr> in cn 

OO tn O tno O cn o tntn-P O tn in O o rn O O O in O 

tn+->-P fd rn 4-> -P OO OO o O in -P -P O -P O-P tr> 

fd fd O in in cn -P O in cn in in O -P in -P tn O in o in rd in 

in in in O -P cn O O cn -P -P in fd tn a OO tn U O O 
-PcnO-POOcncnrdOOOOcnOO-PcnO-PO-PO 

rd O tn rd Cn O fd in -P in -P O cn O fd rn -P tn tn tn rrj 

O o in O O-P o O cn in rd in o o O O in in in in o in o 

in rd rd fdP> .p -P fd in in O OO fd cnocn cdin-PP>-P ■ 

tn tn tn O in in in O in -P rd fd O in in -P O tn O O in ^ 

O cn cn O in O in in O tJi U U O O P in in -P O O in cs 

in rd O P> O-P O rd-P-P O in -P in O fd in -P O in -P in -P ^ 

in cn O fd rd cn rd tn tJ^ U U U fd fd in in in O tn tri tn u O u - 

O in rd cn in O cn O -P O U tP tn U O cn O rd in O cn -P O 

in in rd fd -P rd rd -P cn -P -P -P O in -P U U O-P o in -P 

in -P cn in O cn in in in -P O O O in in O tn O O-P Cn O in 

in in in O O O in O rd in O in in in O in Cn rd fd O in O O 

Cn o in in -p -P in -P O O O in O in -P in fd O Cn cn O rn O 
tr> -p triO O tnO-P Cn tr> -P rd tntnO tn U O tntnrd 
O tno O fd cjn tr> tr» rjn tn O O tr> O tn O O O OO OO O 

-P O tn tJi rd O tT> tr> tTi O -P -P O O tr> cjn t7> tn O -P cjn 
^ di tn di O tnO O Cr» tn O O O O OO D^O tn O O 
-p-p-p tJ>rd O fd fdrd OOO Ocdtn-Pfd-P-PO-PP>rd 
OOcjnOD^4^cdtr»tnOrdO+JOt7»OOrdOtnOOO 
tn rd tr> fd -P OO OO fd tnCnr^cJi-P tntPfd Otr>-P-Ptn 
O Cn-P-P Cnrd tn fd -P tD rjn rd -P -P tn O tn rd -P di tn o ■ 

-P o tntnO-P O-P trt -P rd tntnt^tnO tr> ^ O O O O 

ty\ O Di O O O O fdtnO O fa^fJ>fjnO tnrdO-P CntnO O 

tn -P tn -p P fd tr> o tnfd t7» tr> -P tr»fd fd O tnfd O rd O 

O O tntn*r»-P-P CntxiO^tno O r^fJ^tniTiD^fcn'Cnfd rd 

tnO tTitnrjnO tPtT^tnO O fd O cjnO O O O O O^tJ^O O 

O O-P rd rd-P-P m rji tn u P O tn rji rji tn rji o Cn -P 

CnrdCnOO O O OO OO O OtnOCnO-P O O -P tn 

-P -P rjno rd -P fd fd O tnrd-P OtnO-P O tn -P U tn 

-P CnO tntnrd-P tr» tn tr> -p ty\ -p rd O fd tr> rcn O tr^ O tn 
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tntnOOO tnotn-Prdtnrd&OOtnoOOOOtnO 

OrdOOOOD^-prdOOrdtnO-PtnOOrdOtntnO 
rd rd rd u tn O tn tn rd -P tn tn rd -P tn o tn tn tn o tn -p 
U tn o tn O tn tn tn tn -P O U tn -P O txi tn tr» tn o tn O rd 

rdOrdOOtntnOtnOtP-POOOOOOOOOOO 
O tn .p U rd-P tn rd tr» -P tn tn tn o O-P-P tn o rd o U tn 
tn -P tr» tn O rd U t7> -P t5^U tn tJ> tn O -P tn rd O O rd tn 
OtnoOOOOOtntnOtnOt7>tT>tntnOOOOOO 

tn-P P 4J rd tr> O rdO O rd tn -P tn O O-P O U rd rd -P -P 

tn tn o -P tn tn tn tn rd -P tn -p tn u tr> tn tn o rd tn tn o O 

O in tn O O o U U tn t7> O ^ U -P tn O tn o o tn a tn O 

O -P O rd tn -P tn -P tn tn -P rd -P tn P u tn -P -p O-P tn -p 

O tn tn tn tn tn O tn tn -P O rjn U O O rd O-P rd O-P O O 

tnOOtnOtn-PtnOOOOOOOOrdtnOOO&O 
-P OOrd-P rjn tn o -P rd-P tn O O tn -P +j tn O rd rd rd O 
O tn rd tn O tn O O tn O tn rd rd O tn tn o -P tn tn O rd 
O O tn O tj> -P O U O O rd tn tn O tn tn tr> O O O ^ rji 
4^000-PtnrdtnO-PrdO-PrdrdOOtnOOOrdrd 

tnrdOOtntnOtrttntntntnOrdOrdO-PrdOOOtn 
O O tn o tn tn tn o O O O ' tn tn O -P tn O O O O O tn tn 

P>0-PrdrdtnrdOrdrdrd04^-PtnOOtntnrdOOO 
O tn -p tn tn -P tn o tn tn tn O -P rd O O rd tn tn O tno O 
rd-P tn o tn O O tn O O O tn tn tn tn O tn o tn o tn O o 

tntnrd004->-P-POO-PrdOrd4->rdO-Prdrdtnrdrd 

O tn tr> tn tn tn tn o tn tn -P tn rd tn u O O rd tn .p tn tn tn 

O O tn O tn tn tn O tn O O tn tn O O O O O O O O tn O 

O O tn rd -P tn+J o -P -P -P tn rd O O rd rd tn O -P rd-P-P 

rd tn tn tn O tn O tn tn tn O tn tn rd tn tn tn tn -p O tn rd rd 
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OtnoOOOOOtT>tntnOOOtnOOOOOOOO 
P xj o tn rd O -P O tn O rd tn tn rrj U rd rd -P _p tr> fd tn rd 
-P t^ tn tn t7> rd rd tn tn O . tn o U O U O tn tn O tn tn tn tn 
tn tn tn tn tn O tn U U rd O tn o O tn tn o -P rd O O U U 
4J04^cdtnOtnOOrd4Pfd4JOfdJPOtnrd-POOO 

-P tn O rd tn tn tn xj O tn -P U -P P O xj u tn rd O xj XJ XJ 
tn tn xj tn u tn o u tn o -P tn p> tr> O O-P tn tn 4-> tn tn xj 
^ tn O O O tn tn O tn tn O tn tB t7> tj> o O tn tn O O tn 

OUPtnUtT>OOrdOrdtntnb^xJOUOOOtn&cd 

t^-P tn O tn tn tn tn O U tn tr> O O tn tn tn O tn O tn 

rd O O-p-p-P tn rd O fd rd rd O U xj rd tn xj xj tn rd xj xj 

O rd tn O U tn fd O-P rd tn tn -P O O O U tn tn O tn tn O 

O tn -P tn O tn tn rd O O U O tn P tn o tn tn tn tn o O O 

-P-P tn rd xj xj xj rd rd tn xj o o rn rj) o tn -p a xj u o O 

O & tn rji tn tn fd tn -P O -P tn O O U fd tn O O O tn tn tn 

tn tn -P rji u tn U O O fd O tn tn tn u O tn tn o O u 

P tn O o -P rd xj xj fd tn O tn p fd O O rd xj xj O xj a 

tn tn tn O 4p. O tn tn O 4p tn rd tn O OP> fd rd tn xj tn o rd 

O U tn tn O tn O tn tn tn tn tn O- O tn O O tn O tn O O U 

fUfdP-PdJO^^OtntnrdfrJdJDifd+Jrji tn O fd U U 

tn P tn O tn rd O U O-P O tn tn tn rd tn xj o tn tn o rd tn 

tn tn O tn tn O OOOOtntnOOO-PtntnO-POtno 

P O rd-P-P tn o O-P fd fd rd rd P fd tn xj tn xj tn xj tn tn 

5? -P ° & tn tn tn tn xj tn tn O tn tn o tn tn tn tn rd tn tn 

Cn rji rjn rji rji U rj> tj^ tJ^ tn O O U OO tn -P O O tn O O 

tn rd O tn rd O rd xj a tn -P xj rd rd xj xj u tn rd O rd O O 

O O On o tn O tn tn rd tn O tn tn tn x^ tn tn tn tn tn tn rd O 

OtnOOOrdOtnoOOtnOOOtnrdtnOOO tntn S 
fdtnOtnOtnOrdtnOP>OtnxJOtntnjpoO-PUO i 
tn xJ tn rd tn tn tn tn tn tn O tn O rjn U tntntntntntntnO ^ 

OtnOOOOrjn>tnOOOtnOtnOtnOtntnOtnOO* r ^ 
rd xj cj tn tn rd xj o tried tn xj tn tn rd xj tn xj rji U O rd O" ^2 

PPtnOOrdtnOtntnrdrdOtnxJOtntntnrdOxJrj 
tn tn o tn O tn tn O O tn O tn rji O O tn o O O tn tn tn tn 

OtnxJxJOxJ004pxJtnxJXJrdOOrdrdOP>tnOtn 
OOrdXJxJtntnxJr^rdOOOOrdxJOOtnxJOtnxJ 
O tn tn tn O tn O O O O O di rji U Cn x) ^ O Ui tn tn 

fdOrd0004^rdOOP>xJO-POOtnrdOxJtnOfd 
tn tn tn tn rd tn O O tn tn tn O rd tn O -P tn O rd tn o -P O 

OtntntnOtnOtnOrdOtntnUtnOOUOtntntnrd 
OP tn xj tnO-P O O O O xj O rd -P o O O xj xj tn tn O 

OUr^rj)rj)^|J4Jfdfdrj)^UUOrdOfdUOrjiiJfrj 
O tT» tn O ^ rji u O O-P CntnO O fd CnCnGno tj>0 
rdOrdxJxJxJrdOxJOrdOxJUrdrdxJxJOOrd&tTi 
^ P O rji cji ^ rji rjn rd O O tn^tnCnCnO tn O n hi 
OCr>tntr>tntr>0 O O O tn.O O tr> O O O tn O ^ n n 
P OGnoordrdOfdrdrd&b^xJOrdrcnxJrdrdOrd 

fjntnxJ rd-P tn tr> xj rd tr> tn o fjnt^tTktntnxJ o O tn rn 

O tn O O tnO ty> U O O O O di rji O On O tnO tnfcn 

PP fd O xj xj xj tntnjpxJ O O-P-PGnxJ rd rd rd rd fd ^ 

5>rd O i^fcncno O tnO-P O fd O O fd tnO Cn O tn tn O 

^D^POr^r^fdb^OOtriOOOOOfdtnOtriOOO 
G^-P tnO-P tr> rd xj xj jp xj tr» xj p Cntno rd p xj u O tn 

tn rji rji rji aj o tntJiO O O rd tn u O O tno rjno tntntn 

P tn On a O tT> O tnrd O O rcn U O O O t^tno O tP O 
tn o P rd tn OOfdfdxJ-POfdtnfdOrd-P-POOtn 
O tntPtntnDitntntntnDitnO fd tn O tn tn O tntno tn 
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O _p Q o tr> tn O O-P O rd t» tn O -P O O O OO 

tn rd tnO O O-P rd - tn -P 4-> O O-P tnrd 4-> rd O-P 

O Cn rd tn rd tr> O tn tr> O tn O rd O tr> O tntr»tr»t^O tn -P 

tPtnU tn O tn U O U Cn tn ^ O 4-) U tr> tn O U tn 

-P rd O tn-P tn tr» rd rd tn tn -P rd p> -p rd D^O O O O O-P 

O rd tj» u tj^ Cn O tntntntno tr> o -P rd O rr> tn tn 

u tno Di u (d u O tnrdtntnoc^o u tn o u u tn 

O P> -P O t^U-P-P O-P O tnO-P tnU-P O-P O O rd rd 

tr> tn O rd O cji U O tr> tn o -P O tr> -P tntnrd rd o tn 

tn o U O tri O -p O O t> O tr> O OOtnCnooot^ 

rd O tnO-P rd tr> tr> -P OCnrd-P rd rd rd rd O-P O-P O 
Ot^tntnO tn -p tr> O o -P -P tr> tn -P tn rd tn o rd tr> tn 

-Prd-P rd O tJi rd rd rd O O-P tn O O O -P O rd-PP-PtJ^ 

-P tn .p o tn rd & tP4-) rd rd tn tn tn tri O tn rd O O-P O ^ 

tn O O tn tn O tn O O O O O O -P tn O O O O tn tn tn O 

O-P o -P -P -P -P O tn O O O rd tn tn -P tn tn tn tn O -P O 

tn O rd O tn tn O tn O tn ^ ^ tji u rjn rji tn o tn tn tn 

tn O tn O tn O O tn O tn tn O OO O O O tn O O O O tn 
OOrdOO-PtnO-PO-PO-PrdrdOO-POrd-POtn 

O-P O rd rji bi u tntntntntntntntno rd rd O O rd-PO 
tn tn O O O tn O tn O tn tn O O-P OO O tn O O O tn-P 
tn -p O-P rd -P rd -P -P tn -P O-P tn O O O rd O rd O rd tn 
tn tn tn tn tn O O rj^ tn -P tn tn O O tn tn tn tn rd tn o O O 
OtnOOt^OtnOOOtntntnOOOOOOOrdtnO 
rd rd -P tn O O-P rd -P -P -P tn -P -P -P P O -P O-P tn O 
-P tn tn tn O tn O O rjn U O tn tn tn tn P O tn -P tn t^ tn tn 
tntnOtnrdOOtnOtnOOOtnOOOtntnOOOtn 
rdOOtntnrdrd-PO-POO-POOO-POtn.POO-P 
tn cji tn O tn tn tn tn tn tn rd O-P tn tn rj» tn O O t7* tn tn -p 

O O O tn O tn O tn rj^ rji U O O O tn tn O rd tn tn rd tn tn 
rd rd rd tn O -P tn -P rd tn O tn tn P OP O rd tn O O rdO J 
■PP rj)|J rj^^^rjiO cj^fd O.U O O rjirjirjiUditnU ^ r ^ 
OOOOOtnOtnrdOOtnOtntnrdOtnOtnOtnO 
tn-P-POOOrdOOrdOOrdrd-PO-PO-P-PO-P-P 
tn tn O tn O rd tn tn O O tn O tn O tn tn tn O tn tn tn -P tn 
tn tn O O O O tn rd -P O tntntnotnOO-Ptno tn O 
4->4^-P-POOr^OOtnO-POOOrdtnrdtnOrdtnO 
O tn O -P rd tn O tn tn O O tn O O O tn o -P tn O tn tn tn 
rd O tn o tn o o tn tn o tn o O-P-P O rd tn tn tn tn o -P 

rdnjpOOP>tnOOOtncdOrdtnpppOtn-POO 

tn tn tn o tn tn tn o O tn -P tn -P rd tn rd tn rd -P tn tn tn tn 

O O tn o tn O O O-P tn tn O tn O O O tn O tn O tn tn tn 

O rji U -P -P P tn O tn rd rd rd rd O O rd tn tn -P O -P -P 

tn tn O O tn O O rd tn tn O tn tn tn tri p tn tn tn -P tn O rd 

O O O tn tn tn tn tn -P O O tn Ovtn O O tn O O O tn U tn 

-P rd rd rd P rd O O tn O rd tn O O-P tn rd P -P O tnrdO^ 

O O tnrd tr» O O O tr> Cn tn tn O O tntno tr^tnO tn tn 

O tJ> O Cnt^-P CJ) o O tnO O tntntnO Di tJi tn d> U 

O rd -P tn rcn rd rd O O rd O rd -P OCno O rd tr> tn rd P 

rd O-P tnO tn o tntnrd tn O Otnord O tritntn-P O tr> 
O fcnt^O rd O O tr»0 tn O O O tr> o O O tntntntn 
rdtnoO-P OrdOtnOrdOOOOrdO-POrdtn-Ptn 
tn-P rji O tnrd tJ^O O O tTi-P tntnrdrd tnOrdO rji U 
O tno O tji rji U tn U O O Otnr^UdirjiOO O tno 
rd O rd O O O tnO-P rd-P-P tn O P rjn -p 4-> -P -P rd O tr» 
-P rjn 4J tn tn O O rd O tr> O tr»4-» O O tn O O tn -P tnrjncn 
OOt^04^04JOD^Otntr»OOtnOOOtntnOOO 
rd O cji rj> O rd rd O-P rd tn rd tnO O o P -P -P O rd O 
O CntnO rd CntJitnO Cr»-p tJiOO OO Ob^t^O O -P tji 



\ — I x — I rH i — It — I t — I \ — lr— It — IrHi — It — \ \ — li — It — I H H H i — It — I H . 
C\!(X^OVOC^CD^OU)CN]OD^O^CNlOO^O^C\lCX)^ 

^^Ln^^^^(D(^i^ooH(Nc^lo r )^o^LOLn^lo^ 

OOOOOOOOOOrHtHtHrHx— IrlrlrlrlrlHHrl 

c^c^(^r^r^rorr)cor^cor^r^c^c^r^ 



SUBSTITUTE SHEET (RULE 26) 

BNSDOCiD: <WO_0127284A3JA> 



WO 01/27284 PCT/US00/27433 



54/70 

rd tn -P rd rd O-P o -P tn tn -P rd O rd -P tn -P rd O-P o -P 

O U U O tn -P tn rd tn O O tn .tn tn O U fCS O tn -P tn tn O 

O tn tn tn o tn O O O tn O tn O O tn O O tn O tn o tn O 

cd tn tn rd tn 4-> rd .p tn -P rd -P o rd tn u O o -P tn -p o rd 

tn tn O tn U tn tn o tn tn rd tn tn tn rd tn tn o O tn tn 

O tn tn tn O tn rd tn o tn o tn O tn O tn tn tn .p o ~P O tn 

rd tn O UP U -P rd rd o tn O tn O -P U -P tn rd rd 

^ tJ> O O U O O tn tn tn tn O o U rd O-P tn O O rd 

O O O tn tn tn O tn tn O U U O O-P tn U tn o 

rdOOOtn-Ptnp>tnOO O OO tn tn rd rd -P O rd O-P 

tn rd -P OO-P tn tn tn tn rd rd rd rd tn O tn tn O tn tn -P -p 

O-P tn tn -P tn tn tn O tn O tn tn tn O rd O O tn O O O tn 

-P tn rd tn rd tn tn o O O rd -P tn -p O rd tn o O tn -p tn o 

O O tn o tn O O rd rd tn tn O tn o rji o rd cd O tn o O tn 

O tn O tn o O O rd tnOOtnOOtntnOtnOO rji o rji 

tn O rd-P O rd tn -p tn tn O rd rd -P tn O O-P tn -P -p rd 

tn tn tn O O tn rd tn tn tn 4-> O tn tn O tn tn O -P O O tn rd 

OOO Qpotnotntno-P OO tn o tn tn tn tn o tn O 

-P O-P rd tn -P -P rd -P O O tn-p rd-P O rd-P-P-P tn -p tn 

O tn O tn tn O rd . tn O O tn O rd O tn tn o O O O tn O o 
tn O tn U O OO O O O O O O tntntntntntntntntnrd 

O-P-POOOrdOrdO-prdO-PO-POO-P-Ptnrdrd 
tn-PO rd tn tn O O O tn tn tn -P o -P O tn tn tn tn o tn tn 
tn tn o tn tn .p O O O tno tn O o tn o tn O tn tn o O O 

tntnrdrdrdtnOrdtnO-POOrdOOtnOrdrdtnOO 
O O tn rd tn tn rd O O tnrd tn rd o tn tn -p tn tn tn tn tn O 
tn tn o o O OO tno tn tn o O tn tn O tno tn o tn tn tn 
4->rdtntnOO-P-POOrdO-PtnOrd4-)-PtnOO-P4J 

O tn tn tn tn tn O tn tn O tno tn -p tnrd o tn tn o o O O 

tn -P O tntnotno O-P O O tn tn tn O O tn O O O tntn ^ 

Otn04->POOO-Ptn-P4->4-)00-Ptntntn-POOrd • 
rd tnrd rd u tntnO tntnrd cd O tn-p o tn tn tn tn rd tno ^ 
tno O tno tno O tno o tntntntno tno O O tno o r ^ 

O-P-Prdrd-POO-PtnootnoO-PtnOtnrdrdOrd' ^2 

O-POtntnOtntnOOrd-Ptntn-Ptn-prdrd-Ptntntn 

O tn tn o o tn tn O tn tn tn o O o O tn tn -P o tno tn O 

rd tnO-PP-P O O tntno rd rd rd-P tntntntnrd o -P o 

O tn P tn P o O O O O tntntntn-P tn tn O o tn tn tn tn 

rd o O O tn -P o tn tn O o O O o tn O o o tn O tn tn tn 



cd tn rd -p o tn o rd tn tn tn -P O-P-P rd-P O-P O tnrd O 

rdrdrd4->OtntnotncdtnOt7»tnOrdOtnOtnrdO-P 
O tn tn O O O tn o tno O tntntntno tnO-P tno-P O 

O rd -P -P P -P O O-P O -P 4-> -P rd -P O O-P O tn rd rd rd 

tntno tntntntnrd O tn O O O rd tn tn tn rd tn O rd tntn 

tntno tn o O tn o o tn o tn O o O tn tn tn tn o O OO 

O tn -P tnrd-PO-P O-P tn O -P O -P O rd tn O tno O rd 

O O tn tn p tn -p o rd tn tn tn tn tn -P tnrd O tno tn rd P 

O O tno tn tn tn O' O tnrd tn tn tn O tn tn O O tn O tn O 

tn O P tntnrd-P-P tn -P rd O tn tn -P O O rd rd-P O rd tn 

rd-P tntno tn -P tn tn O tn tn tn O -P tntntntno tn tn tn 
OtnoootntnOOOOOOtnOOO-POtntnOO 
OOrdrdtnrdrd-PO-P-P-PO-PtnrdOtn.pOtnOO 
tn-p tn tn tn tn tn o rd O tno tno rd-P o tn -P tno O tn 
tno o O tn tn tn tn tn O O tn O o tn tn tn o tntnrd rd O 
O O-P O tn -P tnrd tnrd rd O O-P-P-P-P-P-P O O O-P 
rd tn tn tn O O -P rd tno tntnrd tno tn o O O-P o tntn 
OOOOOOtno O OO tn O tn O O rdO O tn tn -P tn 
O-P tn -p P -P -P tntno o rd rd rd rd tn tn tn O tno tn o 
tnrd tn o P rd tn tn tn O tn tn O o tn tn tn tn tn rd rd tntn 

' ' * f * — I rH t — It — I* — { \ — I i — I \ — I \ — I \ — I \ — I \ — | < — [ i — 1 rH i — It — I x — I \ — It — It — | 

OVO(>JCOsrO^(^a)^OVD(N(X)^0^(NCD^O^)Cs3 
CDOOO^O^OHHC^(Nn^^LnLO^I^I^OOODCriOOH 
HHHHOJC^(Nr^J(^CN]rNJCNCN]r^CNir>]C^(NW 

nornnrnnrornrnrnnrornnnoon 



BNSDOCID: <WO 0127284A3JA> 



SUBSTITUTE SHEET (RULE 26) 



WO 01/27284 



PCT/US00/27433 



55/70 



rd tj\ O O Cn tn Cn Cr> rd Cr> Cr> Cr> u o tr> -p -P rd O tr> Cr> rd 
+J -P fd tnotn-P tnU U Cn -p O O Cn Cn o O OU O Cn o 
Cn U Cn O rd CPU Cn Cn Cn fd Cn .p Cn -P U U O Cn -P fU Cn U 
OOOOCnOOrdOOt7>OP>OOOrjntr>p>p>tnoCn 
Cn rd rd o 4-> Cn O rd Cn O tn U tP O ^ U U tn tn O U -P 
Cn O Cn Cn Cn Cn Cn Cn rd U U rd U rd Cr> Cn p> O rd fd O O Cn 
o fd U tJ^fd O O t^m Cn o Cn tn o rd fd Cr> rd Cn Cn cn Cn Cn 
Ofd^CnO-POOOCnCnC^OCnCnOOOOOOOO 
rd Cn O tn O U fd fd -P ty> O O tr> rd O O tn rd fd O O-P fd 
u U u u tr> -P -P O o O-P fd CP o -P u fd tn Cp rd CP Cn tn 

UOt^-Pfdt7>tTi04~)OOUtnUt^fdUOOOOOtri 
fd tn u CP tn O U-P fd fd fd o o O u tn o -P tn -p tn o fd 
O tn tn O O tn O rd-P-P tn U U O O • tn -P tn O -P O-P O 
-P-POrdOt7>fd4JOtntntntnOOOOOOtnOtnO 
Cntntntnfdfdtntntnrd-POfdrdrdfdOOtnoo-Prd 
O tn -P O tn U tn Cn O tn tn tn tn tn tn P tn rd O Cn -p O fd 
tn O tn O rd rd --P fcn tn O -P Cn tn tn O O O O Cn .p O tn O 
tntntntnCnCnCntntn-p tn fd O O-P fd O -P -P tn -p tn -p 
OtnOOtnOrdOOOitntntnfd4-JtnfdtnOtn-POrd 
-p -p -P fd O-P O O O fcn tn O O Cn -P U O Cn tn Cn rd tn 

OOtnOOOtntnp>rdtnO-POtn-PtnOOOOrdO 
tn tn tn rd O Cn tn fcn o rji tn Q tn tn tn tn tn -P O O O tn 
tnOfcntn04->OOOOOOOOtntnordfcnOfdOtn* 
O O tn tn O rd tn rd tn rd O-P-P-P O O -P O rd tn rd -P O 
O fd O tn rd di O tn O tn O rd O O tn fcn rd O fcn rd rd O-P 
fd O-P rd fcn O O O O O O fcn -P tn -P -P O-P rji U O tn O 
fd fd fcn O O-P O O rd-P4-> O tn O fcn fcn tn tn -P fcn fcn -p fcn 
O fcn O tn rd O O O tn O -P tn tn tn O tn tn tn rd fd-P tn tn 
tn O rd O tn -p tn rd tn fcn tn O -P tn tn O O tn tn fcn o tn -P 
O O fcn tn O -P O O fd tn fcn rd tn rd rd-P-Pfd O P rjn p rji 

tn O tn O O tn O rd fcn O O tntntntntTifcntntnrd O O U *S 
-prdOrdfcn^-P004^04^00tn4->OfcnfcnfcnOOtn ,j 
O O O tn rd O-P O O tn fcn tn P tn -P tnrd-P-P-P tn P> P> ^ 

u u tn tn re o fd tn -p tn o o -P tn -p -P o tn tn tn o 
rdfdO Ordrdtn^tnrdO-POOtnoOtnUOOtn rd' 
tn tn tn O O tn O O-P O O tn tn O OO tn tn rd O rd tn p 
tn O O tn tn O -P O rd fd tn tn O O tn rrj tn tn O fd tn tn tn 
P4^04^tnn3fcnfcnOtntnOOcdtnOOfcn^ot^ 

fcn p> -P O-P tn fcn O tn -P -P rd O O rd-P-P-P-P-P rd O O 
tn tn o o tn u fd-P tn tn rd tn O -P tn p o O tn tn tn tn tn 

04JrdOtnrdOtnOtnOOfdOtnOtnOtnfcnOfcntn 

rji tn u tn rd o tn o tn tn tn -P tn p p o o tn o tn -p 

rjt o O O-P fcn tn rd -P tn O O rd-P tn tn fd tn tn tn tn O 
OPOfdtntn-POOOOOtnOOOtnOO-Ptnfcntn 

tn O tn tn U fd tn -P tn rd -P -P O tn O tn rd rd O tn -P fd -P 

tn tn o O tn O tn _p rd tn tn rd O tn -P tn tn P rd tn -P tn tn 

-P 0-P-P4-J O fd tn tn O O O fd O O-P rji rji o U tn U tn 

O fd-P OO tn -P tn rd -P -P -P O-P-PtnrdO-PO tn O O 

tn o tn tn tn O O tr»tnO tr> -P -P rd tntritjio Cntntno 

O tr> P O-P rd -P O O fd O tnCnO tr> O O CnCn-O O O 

-P fd-P Cno tJ»tnO t7ird O fd O-PfdOO-P tn-P tn-Ptn 
O OtntninO O fd rd tp tn O -P CntnO rji tn u tnOO 
fd O fd CnO rd fd O tnO O tntnO tn U rd rjnO O Cntno 
tnO CnO tntJ»rjr>0-P O-P tn O O-P u rjriO-P fd tJ> tr> 
O tn tn rd tT»-P & -P rd tTifd-P O tnO rd tnO-P tno Cntn 
P -P O rdfd tntntntnO tno rji U tnO Cnrd O rjn o cj rji 
4-> O 4-> txttprd rji rj» U -P O-P CnO-P tJ> rd tnO-P C7>+-> fd 
rjr>Cr»CnCr»Cr»0 fd fd O CnO-P-P Cnrd CnCnCn4-> O fd-P Cr» 
^CriCP-P-P4JCnO-POrdCnOOCr>00000000 
Cno O O O fd On rd rd O fd O tn rd P> Cnrd O CnO CnCnO 

r— I T — I ; |t |t 1 T I X I t I T I T I T 1 T 1 T I t \ \ I X — It It — It I T 1 t I \ I 1 1 

CD^O^C\lCX)^OVDCN]OO^O^CNOO^O^CNlOO^O 

HNnn^^Ln^^^^oDmc^ooHc^l(^lf Y )( y )^lil 



CO 



BNSDOCID: <WO 01 27284 A3 J A> 



SUBSTITUTE SHEET (RULE 26) 



1 



WO 01/27284 PCt/USOO/27433 



56/70 

tn o U tn rd t^ -P tn o tn-P tntntntntntntntntntntno 
tn o rd tn tn tn tn tn U tn tn o O U U O o tn o O tn U O 
-P O rd o o -P o o u tn u -P o U-p tn-p-P o tn o fd+J 
+J oJ ^t^^t^^ UiU»U4-) tnO U4J O^ tntnl^Dird 
tn O tn tn tn tn O tn O Cn U tn .p U O tn tn tn O U tn tn 

OtntnOOOOrdOrdOOtn4->0-P-PO-Prd-Prdtn 
tn tn O tn o -P -P O tn tn a tn O -P rd O u tn-P u -P tn o 
tn O U tn O tn tn tn .p O O tn rd o tn tn tn -P tn-P o tn -P 

fdOfdfd+J+J-PUfddjUtnt^^UUPCnotntntn^ 
U rd tn o rd tn tn o tn tn tn o tn tn u -P tn tn -p tn o tn o 

tn U U O tn tn tn O tn U tn -p tntnbitntntntn.p tn o U 

tntno O rd O tn tn O tno rd tn -P O tno O-P tno O O 

rd tn tn rd tn tn tn rd rd O O O tn O -P -P O O-P tn tn tn tn 

O O-P O tn -P tn tn rd tno tn U tn tn O tn-P tno O tn O 

4JrdtnfdtritnootnOcdrdO-P(dfdrdOO-P-P-PO 
rd O tn O O tntnO tntntntntntnotnocdtno rd Ord 
tno O P> P tn tn tn O O O tnOtntntntntn-P-p tn tn tn 
-P O rd tnrd tn tn O O O rd tn-p-P tn -P -p -P tn tn tn -P -P 
tn tn -P tno O O tnO-P tn-P tntno tn tn tn tn O o tn tn 
tn tn tn O tno O tno O O O O tn tn O tn tn -P o tn tn O 

tnrd O -P tnrd tnrd O-P O O O tn -P rd «P tn O tn rd -P tn 
O tnrd rd O tn o tn rd -P rd rd' tn O O O tn tn tn O tn tn tn 
tntno O tn tn tn tn tn tn O O tno tntno O tntno tn tn 

tnprdrdPtnOrdOP>^POtntnpOrdrd-P-POOrd 
O tn tn rd O-P O tn tn tn O tno O O O tn -P tno tn -p tn 

tnOOOtnOOOrdtn-P-POOOOOtnotnO-PO 
fdrd-poo m tn p tn-P tntn-PtnrdO-PO rd -p o tno 
O-P O-P tn tn O tntno O O O tn tn tn tn -m o tntno tn 
OO tntntnotno O rd O O O O tn O O tn tn tn tn o tn 
Ord-prdP-PrdP O O rd tn tn U tn -P O rd O tnrd Ord 

tntnordOtntnrrj rdtntnO tntnOO O O 'tn O O Otn 

OtnOtntnOtnOrdOOOOOOtnotnOOO-PO T 

OPO-P4->rdtn-P04->OOOtnOOrdtnOtnrdtntn ^ 

O tntntntnO tn tn tn tn tn rd tno tn tn tn .p tnrd tn tn -P' r j 

QUO tno tntnu tn tn o O O tn tn tn tn tn o o O otn S; 

tnrdrdOOO-PrdOO-P-P-PO-Ptn+->.pOtnordrd u - 
tntnO-PrdOtnO O tnOtnOtnrdO-P-P tno O rd tn 

OOtnrdtntntntnotntntnOtnotntntnotnO OO 
tn p O tn rd OOO tnrd tn-p-P tn tn rd Otn-P-p rd-PO 
rrj tntntnrd-P O tn tn tn o O tn O tn tn -p o rd -P o tn tn 

tno tno tnO tno O tn O O tn tn tn O tntntntnO tn o 
OOrd-PPOtnrdOOOOrdtnrdtnrdOrdO-PO-P 
OOtntnrdU4^rootntnrdtntnop>OtntntnO-Prd 
tn p o tntntno O O-P tno tno tno tn tn tn tn tn o tn 
O tn P tn-p rd O O O tn O P> rd tn tn O rd -P tn -P tnO-P 
tno aj -P tntno rd rd tn tn -P tn rd P tn tn -P tn tn tn tn o 
OtntnOtnOtnO tntntno OO tn o O tn tn tn O otn 
O-P O -P tnrd tntnO O rd O tn-P rd rd tn O rd-P-P-P rd 
fd O O O tntno tno O O tntntno tn tn -P o tn-P tn tn 
O tn p tnO O O. O O tn O O tno rd O O tn tn tn tn-P o 

-Ptnrd004->Otnrd(dOOtnOO-Prd-POtnoOrd 
O O tn O tn -P rd rd tn tn tn rd tn tn tn rd tn tn -p rd -P rd tn 

tntnOOtnOtnOOOOOOrdOOOtnOOtntntn 
tno tn tn P rd rd -P -P O tnrd tn tn tn O -P -P rd O o -P -P 

tntntnOtnotnrdOtnotntntntnp>Ord O tn tn O -P 
O tntno tntno O O O tno O O tn tn tn tn -p o tn tn tn 

ajtn4^tntn4^0tnrdOP>tn4^rdrrjrd-P4-)tnO-POO 
fd.O O tn tn -P -P tn tn rrj O o tntno tn tn tn tn tn rd tno 
o tn tn O tntntno O O o tno o tn tn tn tn tn o tn -P o 

tnrd tno O O-P-P O O O -P -P -P tn O rd O-P rd tn tn tn 
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tn o tn tn O -P tn O -P tn cd tn tn tn O -P rd tn tx» tn tn tn rd 
O-P tn tn O tn rd tn O rd tn U O O-P tn o O O -P O O O 
O tn O rd U rd O tn U tn U 4J -P (d U 4J O tn O -P P 
tn U tn O O tn -P tntntntntntntnO.tntnrd tn tn tn tn 
OOtnOtntnOOUOtnO-POOOOOtnOOOtn 

4-> o cd-p-p o o -p -p u tn o o -p -p o a o tn o o fd p 

O rd tn tn o tn rd -P U rd tn tn tn rd tn tn tn rd tn tn tn tn tn 

tn Cn U O -P O tn tn o O-P tn o U tn O O O -P O tn 

-p-pfcri+j-p-P tn rd rd4-> tn cd tn tn cd O -P -P tn -P tn tn tn 

O O O -P tn O U rd O tn O tn tn tn tn O O tn O O tn -P O 

tnOOOrd4->tntnOtntnOOOOOtnOtntnoOrd 
cdO-POOtntnrdrd-Prd-PtnOrdOOtnOtnOOtn 
tn rd O tn tn tn -P O-P tn O O tn tn tn -P -P tn O -P tn tn tn 
tn tn o O tn O O rd-P tn rd rd tn rd rd tn tn O tn O tn tn tn 
rdOrdrdrdtnrdtntn-PrdOrdOtncdtnrdotnoOrd 
tn -P rd tn tn tn tn O O O tn tn tn tn tn tn O tn rd O tn rd O 
-POOOOOOrdtnOtnOOOOtnoOtnOtnOtn 
tn rd O -P -P tn -P rd tn -P -p O O tn -p -P O O tn O rd -P O 
O -P tn -P O O tn O -P tn o tnCntntntntnO-P tn tn rd tn 
tn tn O tn o rd tn O tn tn o O O O O O tn tn tn tn tn O tn 

4_>4_) _p tn O rd rd -P rd tntntn-P-POtno tn tn -p O O tn 
O O-P tn rd tn tn -P & O tn tn O O rd tn -P O O O tn rd O 
tnOtntnOOtntnOrdOOtntntntnOOOOrdOrd 
OO tn O U tJi-P O rd tn rd -P O O -P -P rd -P O rd tn -P O 
4JO-P tn rji a tn O tn tn tn rd tn tn tn tntnrdrdOtntntn 

tnOOtntnOtnOOOOtnO.tnOtnotn-PtntnrdO 
rd-P rd rd rd tn tn rd tn O tn tJ^-P rd O -P O O tn O U> Cn U 
tn rd -P tn tn o tn -P tn tn O O O O tn tn tn tn tn tn O tn tn 
tntnoootntnOOtnootnOOtnootnOtntno 
p tn rd O -P -P tn -P O -P P> O -P 4-> O tn rd -P O tn . tn O rd 

tnOOO-POOtnOOtnO-PtnOOOtntnOOOO » 
rdOO-PtnOOOrdO-PtnO-POtnOrd. O 4-> rd tn P 
O tn tn O tn rd rd O tn -p tn O 4-> -P tn tn tn O tn rd rd tn o 
tn tn O tn O U O tn O O O tn O tn O O tn tn O O O O O* 
OOrdrd4Jtnrd-POOrdO-POrdPrdrdtntn-P-p-p 

O-P tn O tn tn -P O O-P tn -P tn tn tn O tn tn O tn O O tn 
O tn tn tn rd rd t^O O tn tn tn tn O tn tn O tn O O tn tn O 
O -P -P tn O O O rd -P tn O rd -P -P tn tn tn rd O tn -P O O 
PtnrdrdrdmtntntnOrdOtnOrdtnOtntntnOtntn 

O tn tn O tn O tn U tn tn tn tn -P O O-P tn u tn O tn O tn 

rd O O O 4-> -P rd-P D> tn O O O O tn -p tn O O-P-P O 

O O rd rd-P tn tn tn O tn -P tn rd rd rd tn O tn O rd tn tn tn 
rd-PtnOtn-POtnOOOOOOtnOOOrdOtntntn 

O tn O O-P tn O O tn -P O OO rd-P-P rd tr»Otn-P-PP> 

P O t^tntn-P tnO tnO tnrd-P tntntntnO tntnO O P 

O O tno tntnO O tno tno tr»tntntntntntnO O rd tn 
O-P O rd O O-P tn -p rd O tn 4-> -P -P -P O O tn rd rd tn 
tntnrdtnoOOOOtnO-POOOrdtnOOtnOtno 

O tr> rd tn o -P tn rd U O O tn O O tnOtnO OO tn U O 

rdtnrdrd rdrdOtnO-PO-POtntnO-prd-POrdrd rd 
-Ptntno O tn tn tn O tno-P tnrd-P tno tn tn rd tntntn 
OOOOtntnOtnOOrdtnOOtntnOOOOO+->0 
cdO-P-POOOrd-PO-P-POO-PrdtnO-P-POtntn 
tn tn rd -P o tn rd tntnO tntx»rd O O O tn tn O tr>tPO O 
Ot^OOOOO-PtnOrdOOOOO-PtnptriOtno 
^4J4^tntn(drdtnOrdO-PtnrdOOtn-Ptn-Ptn4->rd 
tno tno rd tn o tn O O tn tn tn tn O tn tn O O O tno tn 
OrdtntnooOtnOtntnOOOOtntnrdOOtnOO 
OO cd-P-PO-P4->-P tno O rd rd-P O tn -p 04-> O rd O 
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cd P tn O tx> tn O tr> O tn o fd & O O 4J O O tn -P P 
tn O tn tn o tn U o tn O U tn o O O o o O tn o U U tn 

OrdOPt^tntnOOfdOtnOtnOOtnoO^OtnOtr> 
OfdtPtnpfdPPOtnoOPOtntnOOfdcdfdPO 

tnOOPOtnt^OOOOtnOtnOOOtntntnOtno 
P>Otntn.fdOPPPPPOtnOO<d fd O OU tn O rrj 
O & tn u fd U tn o o tn tn tn O tn tn tn tn tn O O tn tr> tn 

tj> tn P t^O tr> O P O O OP tr> O u tn p O u tn tn 

fdPtntnfdrdOOcdPfdPrdOPOPtnrdfdfdOP 

tn o P U U tn P a tr> U tn o tn tn fd tn p tn tji tn -P 4-> tn 

O tn O tn tn tn tn tn tn P tn tn tn tn o tn o tn O tn O P 

rd rd P O tn P p tn U tn tn tn P o tn p p rd P P tn o o 

O fti tn tn tn rd tn O P O O O tn O tn tn O tn rd fd tn .p fd 

fd tn tn O O O tn O fd O tn O fd O O O tn tn tn O tn P> O 

P O OP O O tn tn rd O fd fd O O P P UP O tn fd tn p 

tn tn tn P tn tn p O tn tn tn fd 03 cd tn cd tn O tn O tn rd 

tn tn tn O tn O tn O tn O O tn tn tn O O O O tn O O P tn 

O U O O tn tn O tn fd P P tn U OPP rd -P P O fd tn O 
tn rd tn tn p o O O O tn fd U U fd <d P tn P p tn p tn tn 
fd P O O tn tn O O P P tn tn O tn tn o U tn tn o tn O tn 
O tn rd rdPprdOtnrdpprdOOfd-P o O fd UU O 
fd tn fd tn O O tn tn tn U tn O O tn tn tn O tn tn fd tn rd ■ U 
O tn tn o P O tn o O tn tn O O tn O tn O O tn tn p O tn 
.OtnOPtntntnoOrdOfdfdfdPtnrdtnfdfdO-Ptn 
P OP O tn tn tn tn rd tn fd tn tn tn fd P tn tn tn tn tn O tn 
tn O O tn tn O U tn O tn O O tn tn tn tn o P O O tn tn 
tn P O P tn fd o O fd fd opp OOP tn fd tn P tn p> fd 

OtnP-PPOtnrdtnOOtnPOO-PPOOP tnUtn l£ 
OtnOOOOOOtnPtnOOtntntnotntntnOOtn ■ 

OfdPOOtnfdPfdCr>oopfdfdfdOir»OPPfdP ^ 
P fdP tn tn rd tn P tn O fd tn tn tn tn O P o O O OPP r ^ 
O tn tn O O O tn tn O fd O O OPO O tn o tn tn O P tn 52 
POP fdP tnp-pOfdtnOPtnrd fd tn o O cd P tn tn u - 
fdtnOfdfdOOOtntntnrdrdtntntnopptnptntn 
tnOtnOtntnOfdtnOOtnOOOOOOOtnotntn 
tnfdP tn o P O fd tn o fdP OP O O fd fdP tn p fd-P 
tn tn tn tn P O fd tn tn tn o fd fd tn tn O tn P rd P tn tn P 

OOO tnOOOOOtnOOtnOtnO tn O tn tn P P tn 
O OP o O fd OP O fd tn O O P fd O O tn tn o tn rj> 4J 
tn rd O tn rd P fd tn tn tn tn P fd o tn fd P fd tn rd tn fd tn 

OOOtnOOtnOOtntnOOfdtnptnOOtnOtnO 
OP O OP fd tn p tn O P rdOtntntnoOPO O fd fd 
tried tn tn Iji rji O -P O O O tn tn O tn O O fdP fd fd tn P 
O tn O O tn O tn tn O t^t^O tr> fd O tr> tr> O O dn u CntJ> 
fd fd fd fd o tn p tnPPPPP fd OPP tnpfd fd rd o 

Otr»CnpotPOO&fdt^PtJiD>0^fdtnPt7»tTifdtn 
OUtnUt^Otntn^t^t^fdtntPfdfd tno Ont^tTitnO 

p>OPOPtj»fdfdtr»OOtr»tnPOtnfdfdOPOOP 
4-> fd OP O tr> tr> tr> tr> P tnt^O O tntntPO tn O fd O fd 
d> O tnO b^tntnfcnp tr>fd rcnO O tnp O tntnOP t^tr< 
PP fd O tnfd fd tntno OPP fd fd tno CnOP tnp Cn 
p tntntntntnO tJ^ tJ> O tnp O tnfdO tntntntno 
trt^O tnp O O O O tntno tno tno O tno tr»0 tnp 
OfdOPCnOPfdPPPfdt^fdOOPfdfdfdtnfdtn 
tT^ O O fd fd fd D) U O Di rji u O tn r^n tn rjn o OO tn 
tnOP tnop Cncncd tn tn P O tn O O t^O O tntnCno 
O tn Cn rjn rj^ u -P fd tnfd tJ»0 OPP O OP tn p o O 
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tn-P tn O fd tJ»G trttnnj tn tn tr> o ty> P o P -P -P tn 

P O O tn tr> -P tn tr> tr> -P (d o U tntntntnOvo 

tr> -P fd P OtnO tn O U -P O P Cn O tn O O P O Cn P 

P p tntntr>P fd tntnO -P tntnO fd U tnt^t^-P O fd 

O tr> O tr» tn tn tJ> tr> P> tntntnp fd o tntntncntntntr* 

fd fd o O tn o tr> CP o tn-P-P o tntno-P+->-PPO O fd 

^^tn^t^tT»tnO tn^^^-P tnU t^U D^fd t^U t^tn 

P o t^ty-P tr»t^-P t^trktntJitnp U O fd d\ tr> tn tri 

tr> o fd P m tri o tr> cd P P> t^t^QO-P fd tr> tr> o O P 
c^«J O t^tnO tn.U tn-P tntntntnu t^^tn^^-P ^ 

tr> O O b^&tJ»t7»tr>tntr»t^-P tr> t7> O O O tnty>P-P tntn 

OP fd u rd txktn-P-P-P ^ Di U -P -P o fd o P fd tno fd 

P p P o o tn tn P t^tnotnO fd tn -P tr> tr> -P O 

fd tn O tP4->4->-P tn-P p> rji tn tr> O o tntnp tn tn 

tri P OP fd P fd tr»fdtnCntnO fd O fd -P tntnOP O 

O trk-P-P fd fd OtnO ty> O tr» tn tn o tn -P P P tr> -P 

tn O Cn tn t7> -P -P tr> P tr> tr> o O O tntntni^tn-P 

-p cd 4-> tn -p -P O fd tntnO tr> fd -P O O fd -P fd fd tn P rd 

O tn P> O tr» -p tn O -P O Cntnfd tn £n tn O tr» O fd O 

tntnO fcr> t7> tn tr» tn -P tr»tr»tntntJitn0 4->-P tr»4^ tn-P 

OOOfd O tr> fd O-P tr> O -P OP-P-P tnt^t^P tnP 

P o OO tnOD^fdtrttn-p tn 4-> Otnotntntntntntn 

p tn tn tn tn tn |J tn tn fcn O tn o O-P tntntr»t^ 

tr> fd P O fd -P tr> fd P P tr> O O O o tnp fd P -P 

tntnfd O-P O tntFifd-P-P tn tn tn tr» o ^ tr> tr* tr> 

O O O O-P tn ty^ P P tntnO tn -P tn Cr> o tn 

OO tr> P tr> fd -P P O fd fd -P tn o o -P O fd O -P P -P 

tr> p O tr»0 tnp tx> tr» tn Cn tn O fd O O tnOP-P-P Cn 

tj* tj> tr» tj> tn -p tr» tJ> tr> tr* £n fd o tn -P tr» tr* tr> tr» 

OP t^PP O tnp fd P -P -P P O fd -p -p O tnfd-PP o ' 

fd O tntntntn-P fd 4-) ty> tn. -P tr> O O tntntnt^P tn Si 

p _p tr> -P -P P P UP-PP tn fd O O O O O OP tr>P o ^ 

^tJ>tJ»trkCrktPtnCntntr>4-> tr> O P fd tJ> tr> P o tr> tn 

O tnOP tntnp tritTitnCnb^-P G^tnO O p p> .p tntnP' ^ 

fd O fd O-P tnrd tn O -P CnP O Cn fd -P O tr> -P P -P O O u - 

OP tn tn tn 4-> tnCntntntnfd ^ tn t^P P P O O 

Cni^O O-P tr» -P t^D^tr»-P tn fd O tr»-P-P tr> tr> tr> 

PPOPtJ)U^Pt^fdOUd3fdOOtnUPPP(d4J 

P tno tT*tr>P tn-P tr> fd P> O fd tntr»rd O fd tTktntno 

tntnO O t7»tr»P tr»CnO tntntnCntnO O O-P trv tr> P 
OtTtOt^-POtnfd-P4->-PCnOPOfdfdO-POOP-P 
tn tr> tn tn -P O tr» Cn fd P P fd tP-P tn-p fd P OP tr> Cn 
tnp tyitntntj>tT>tJ>tT>D^tr»D^D^tr»t7>o o O tnp-p tJ»tn 
p tr> 4_> ^ P O O-PP-P tr» tr» Cn O -P fd O-P-PP-PPtn 
t7> tn O tntntri-P tntnt^tnCntnO O ^ O P D> tn +J & 
P tr»tr»0 tTktjiCntPtn-P-P tntntntno tJitntyiD>tPP-P 
P O O O CnO-P O-P fji Cn U cn-P+J-P O-P O fd & 
O tntntno tnCnO tntnt^O-P O tntnO tnCnp tntn-P 
rji o OP tnPP tnt7>Cntrktnfd tr>0 O fJ^-P Lr>tT>tr>t7> 

0- P tn-P tji P tj> fd O-P tr> P -P fd -P O fd tn i^P P P 

p P OP tncono tnO tn tT» & O O tnp tn tr> P 

tr»tr»b>fd tntntntn-P tn-P tr> £j> tr> O O t^-P tntr»tr»tn 
O O fd tJi tJi O tn tn -P cntntno O fd O-P P -P Cntn 
-P tn o tj>trktntntntr»tn-P tr»tntnO CnCnO tnp Cr» O O 
tn o tn O P P tntn-P P -P tntno O tn 4-> tntr>P On 
OP Cnp fd fd tr*-P-P r^P> fd-P tn-P fd t^tntntno fd 
P o fd P tr> fd OP Cnt^Cntn-P tnt^O O fd Cntn-P tj\ P 
P t^O £n tr^ P P tntnty»tntntPtnCntnfd O Cnt^tntPtn 
Ditno tntnCnCntntn O. P O o P P> -P tr> fd OtJiOP-P 

1 — 1 \ — I x — i \ — I t — It — It — It — 1 i — I t — I * — I * — I t — I t — I v — I t — I t — I i — I t — I i — It — \ \ — I \ — 1 
O^C^OO^OVDCVJOO^OVOCNJOO^OVDCNJCD^OVOCNJ 

^^co(X)c?lOOHHc^l^o^^^ln^^D^^oDC^l^o 

COOOCDOOCQC^CPiO^C7^Cr\Cr*<Tk<T^ OV CT\ <Ti CTi (Ti O 

ro(^cnf^f^fr)fr)<^fr)<^f^f^c^oo(^f^ 



SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO 0127284 A3 IA> 



WO 01/27284 PCT/US00/27433 



60/70 

O tn rd U o tr> tr> tn tr» p> rd tr> tr> tn tn O tr» tr» tn -P 

-p o-p u o tn4J rd a u nj o p tr> -p o tr» fd -p rd tn tn -p 

-P-P tn -p u fd rd rd tntr»tT»tr»critntnt^o tn u rd tn 

tJ>Q tn U O tJ> U O O Otntnu O tn O O O P> tr> t7> U O 

O -P O P rd ord-P-P (d-prdfcnp O-P-prdtnrd-POrd 

tntr»tr>Cno tntntr»Q tntntnup U a u tr> tn tn o tn O 

tntntr^U Otntnuu tn cd U a O U rd u O tn o tr> o u 

-P O OP O O O O P> tnO-P rd P -P tnp-P-P o O rd t» 

CntntnU-P rd rd tntnO O -P P t7> tr> O U O rd t» rd tr> 

^o^ooooo^D^ordCnooordrdrdtnotnrdo 

P> P P> u U-P ^ rd U fd -P o tnp>p> tn U -P rj _p tntn 
tno bi tn tn tr> tn o o tno U O U o fcntnrd tr> o tn 

tnoD>oouoootr»ouaooouo&otj»aa 

O^OOOrdrdOrdrd4Jtnrdrdrd-POrd-Pt^OrdO-P 
O P -P tT»P> GntntnO O tn-P U-P O tn O tn O U U cd O 

tnOOUUOOOfdtnUUUOtn^tnOfdrdUUO 
O P P> P rd U p a O rd fd fd tntno tn tn tn rd fd O a 
tn^fd O tx> tn tn p O-P U fd tr> rd tntnu tn t» tn tn rd tr> 
-Ptnt^O tn fcr> tn O O OtntnOOOtnOO OO tn £n P 

tTt-P tnO-p-P tnt^tntn-P O O fcn tn rd q tn -P P rd -P tn 
tn rd O tr» &i o O U rd t^triO rji tn rji (ji tn fd fcx> tn O tn 
P> O U tn p> QU rjrji o U tn O O CJ O tn O tT> O Q tn ^ 

fdtnrdfdOfdO-P-PP>fd-P4-»ourd-Prcnt^OfdUU 
tntntnU O tn rrj rd U O tnp rji tn u fd £n rd U O fd 

-P O U (d tr> o O rji rji u O O tn tn tn O O tntnu O-P 
tx> O Q4-> O O rd OO O rd P rd P rd O P rd O tn 
tntn+-> O O rd tnO im D> tr> O O O rd O O rd-P tn tn tn 
tTkO&UtnOOfdOCnCn4-)OtF»CnOOOCnOOOO 
prdr^rdOPrdU4->Prji-PfdrdrdOOP>4->rdrdD^t^ 

bn O P> tn rd O O -P & D> O tntntnt^rjifd tntntntno 

tn O OP in o o U tr» O On o rji o O O tr> o O tr> i 

U tr> O -Cn rd -P -P D^-P OO OO OPP O-PPP U -P O 

P tn O tnrd fd O O -P rd tr> O rji rji tn cji O fd rj> rj> rji aj 

tn o fd tn u tritrktno tno O tn tr> O tn o "tn rd tno o O' 

-PO O O O O OP-P rd P rd O O tnO O tr> O -P OPP 

-Pt^tnp tn rrj rrj O rji tn u O O rd tn rd 4-> tn O tn O tn 

tn O O O O OtnOfd O tr> O O O OO tn O tn tn O 

-P4J-PP tjkfdO rd trtfd rd o o o rdP rd p rd tn-p ty» -p 

tn -p rd tr> O tr> P rdO tn tn rd rd rd OO tn O O Dv O O O 

&tntntntnOOOOOOOt7>OOOOfdOOOrdtn 

t^op>Pfdtr>fdfdfdfdooootj»ooofdtT>-pt7»o 

tJ»tnO cntJiO-P tr> -P CnCntJiO fd tnp> tn O tr> O O tn 
tn^Cntn^OOOOOOOOOOtnoootriOOtn 
O rd P 4-> P fd tr> P P O O O P> fd O O rji rji (J b^O 

O tntntJ^fd O P tr> rd tr^O t^tnO tPtntn-pp> tPtnrd 
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SEQUENCE LISTING 

<110> Kosan Biosciences, Inc. 

<120> Recombinant Megalomicin Biosynthetic 
Genes and Uses Thereof 

<130> 300622004740 

<140> To be assigned 
<141> Herewith 

<150> US 60/158, 305 
<151> 1999-10-08 

<150> US 60/190,024 
<151> 2000-03-17 

<160> 34 

<170> FastSEQ for Windows Version 4.0 

<210> 1 

<211> 47981 

<212> DNA 

<213> Micromonospora megalomicea 

<220> 

<221> CDS 

<222> (1) - - . (144) 

<223> megBVI (megT) , TDP-4-keto-6-deoxyglucose-2, 3-dehydratase; 
SEQ ID NO: 2= translated amino acid sequence 

<221> CDS 

<222> (928) . . . (2061) 

<223> megDVI, TDP-4-keto-6-deoxyglucose 3, 4-isomerase, 
TDP-4-keto~6~deoxyhexose 3, 4-isomerase; 
SEQ ID NO: 3= translated amino acid sequence 

<221> CDS 

<222> (2072) . . . (3382) 

<223> megDI, rhodosaminyl transferase (eryCIII homolog), 
TDP-megosamine glycosyltransf erase; 
SEQ ID NO: 4= translated amino acid sequence 

<221> CDS 

<222> (3462) . . . (4634) 

<223> megG(megY), mycarosyl acyltransf erase, mycarose O-acyltransf era 
SEQ ID NO: 5= translated amino acid sequence 

<221> CDS . .. 

<222> (4651) . . . (5775) 

<223> megDII, deoxysugar transaminase (eryCI, DnrJ homolog) , 
TDP-3-keto-6-deoxyhexose 3-aminotransaminase; 
SEQ ID NO: 6= translated amino acid sequence 

<221> CDS 

<222> (5822) . . . (6595) 

<223> megDIII, daunosaminyl-N, N-dimethyltransf erase (eryCVI homolog) ; 
SEQ ID NO: 7= translated amino acid sequence 
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<221> CDS 

<222> (6592) ... (7197) 

<223> megDIV, TDP-4-keto-6-deoxyglucose- 3, 5-epimerase (eryBVII, dnrnU 
homolog), TDP-4-keto-6-deoxyhexose 3 , 5-epimerase ; 
SEQ ID NO: 8= translated amino acid sequence 

<221> CDS 

<222> (7220) . . . (8206) 

<223> megDV, TDP-hexose 4-ketoreductase (eryBIV, dnmV homolog) , 
TDP-4-keto-6-deoxyhexose 4-ketoreductase ; 
SEQ ID NO NO: 9= translated amino acid sequence 

<221> CDS 

<222> (8228) . . . (9220) 

<223> megBII-1 (megDVII) , TDP-4-keto-L-6-deoxy-hexose 2, 3-reductase; 
SEQ ID NO: 10= translated amino acid sequence 

<221> CDS 

<222> (9226) . . . (10479) 

<223> megBV, mycarosyl transferase, mycarose glycoayltransf erase; 
SEQ ID NO: 11= translated amino .acid sequence 

<221> CDS 

<222> (10483) . . . (11424) 

<223> megBIV, TDP-hexose 4-keotreductase, 

TDP-4-keto-6-deoxyhexose 4-ketoreductase; 

SEQ ID NO: 12= translated amino acid sequence 

<221> CDS 

<222> (12181) . . . (22821) 

<22 3> megAI; SEQ ID NO: 13= translated amino acid sequence 

<221> misc_feature 
<222> (12505) . . - (13470) 
<223> megAI, AT-L 

^221 > misc__feature 
<222> (13576) . . . (13791) 
<22 3> megAI, ACP-L 

<221> misc__feature 
<222> (13849) . . . (15126) 
<223> megAI, KSl 

<221> misc__f eature 
<222> (15427) . . . (16476) 
<223> megAI, ATI 

<221> misc__f eature 
<222> (17155) . . . (17694) 
<223> megAI, KR1 

<221> misc_f eature 
<222> (17947) . . . (18207) 
<223> megAI, ACPI 

<221> misc_feature 
<222> (18268) ... (19548) 
<223> megAI, KS2 

<221> misc feature 
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<222> (19876) . . . (20910) 
<223> megAI, AT 2 

<221> misc__feature 
<222> (21517) .'. . (22053) 
<223> megAI, KR2 

<221> misc_feature 
<222> (22318) ... (22575) 
.<223> megAI, ACP2 

<221> CDS 

<222> (22867) . . . (33555) 

<223> megAII; SEQ ID NO: 14= translated amino acid sequence 

<221> misc_feature 
<222> (22957) ... (24237) 
<223> megAII, KS3 

<221> misc_feature r 
<222> (24544) ... (25581) 
<223> megAII, AT 3 

<221> misc_feature 
<222> (26230) . . . (26733) 
<223> megAII, KR3 (inactive) 

<221> misc_feature 
<222> (26998) . . . (27258) 
<223> megAII, ACP3 

<221> miscjfeature 
<222> (27393) ... (28590) 
<223> megAII, KS4 

<221> misc_feature 
<222> (28897) . . . (29931) 
<223> megAII, AT 4 

<221> miscjfeature 
<222> (29953) - . . (30477) 
<223> megAII, DH4 

<221> misc_feature 
<222> (31396) . . . (32244) 
<223> megAII, ER4 

<221> miscjfeature 
<222> (32257) . . . (32799) 
<223> megAII, KR4 

<221> miscjfeature 
<222> (33052) ... (33312) 
<223> megAII, ACP4 

<221> CDS 

<222> (33666) ... (43271) 

<223> megAII I; SEQ ID NO: 15= translated amino acid sequence 

<221> misc_f eature 
<222> (33780) . . . (35027) 
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<223> megAIII, KS5 

<221> misc_feature 
<222> (35385) ... (36419) 
<223> megAIII, ATS 

<221> misc_j£ eature 
<222> (37068) ... (37604) 
<223> megAIII, KR5 

<221> misc_feature 
<222> (37860) ... (38120) 
<223> megAIII, ACP5 

<221> misc_feature 
<222> (38187) . . . (39470) 
<223> megAIII, KS6 

<221> misc_feature 
<222> (39795) ... (40811) 
<223> megAIII, AT 6 

<221> misc^feature 
<222> (41406) ... (41936) 
<223> megAIII, KR6 

<221> misc__f eature 
<222> (42168) . . . (42425) 
<223> megAIII, ACP6 

<221> misc__f eature 
<222> (42585) ... (43271) 
<223> megAIII , TE 

<221> CDS 

<222> (43268) . . . (44344) 

<223> megCII, TDP-4-keto-6-deoxyglucose 3, 4 -isomerase; 
SEQ ID NO: 16= translated amino acid sequence 

<221> CDS 

<222> (44355) . . . (45623) 

<223> megCIII, desosaminyl transferase, desosamine glycosyltransf erase; 
SEQ ID NO: 17= translated amino acid sequence 

<221> CDS 

<222> (45620) . . . (46591) 

<223> megBII-2 (megBII) , TDP-4-keto-6-deoxy-L-glucose 2,3 dehydratase, 
TDP-4-keto-6-deoxyglucose 2,3 dehydratase; 
SEQ ID NO: 18= translated amino acid sequence 

<221> CDS 

<222> (46660) ... (47403) 

<223> megH, TEII; SEQ ID NO: 19= translated amino acid sequence 
<221> CDS 

<222> (47411) . . . (47980) 

<223> megF, C-6 hydroxylase; SEQ ID NO: 20= translated amino acid sequence 
<400> 1 

ctcgagecga tgctcggcgg cgcggtgggc caaccagtcg tggacgtcgt cggtggcggt 60* 
gggaggtccg ccgtgccgag tcaggaaacg tattgccgat tgtgtggatt ccggagtcgc 120 
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atgaccgttg acccgatccc ccatacgcct ctcccgtgat gtcgtgggcg gtccgtgcgg 180 

taccgcccgg actgacattc gtcgatcaag accccgccca gtgtagggct ccgcccgcga 240 

cgggagaagg tccgtcgaac aacttccggg tgaccggtcg ccggcgtcgg tgaaacgggc 300 

gtcggagcac ccgatcattg ctgtcggtga acttcctaac tgtcggcgcg cacatctttc 360 

tgaccggtgt gttccgtggt atgacgcgtt cccggcccgt ctggaactgt gcgtgggact 420 

gaccggttgc ggcgtgtttt cgcccgtttc cgaactgcgg attcgtcgat cgcgcaggtg 4 80 

ggagcgggtg gctgaccggg atgatctgca atcatggcgc tcaatgacga tctcttgtag 54 0 

catggtccgc gccgagggtc cgacaggccc gaaacgcccg gcatccagcc tgttcgacga 600 

cgtcgacatc accgtgcaag ccgcgatgac accgacacca cgccatgctg gtgccgcact 660 

ggaagggtgg cgcgatcagg gaaatggccg tgtcactaga cagacgccaa acagctgtcc 720 

gggcctgcgg aaacagcatc gatctgcgtc agccgttcat tgccccggcg gcaccgcctt 780 

ggaaatccgt gccaccggtc gtccgcagtg acgatcgcgg acccgggttt cgagacagca 84 0 

ggtagtaggc gatgcaggcg tttcgtctcg cgccggacgc gtcgcactag gtggaatccg 900 

tcacagtctt caatccggga gcgttctatg gcagttggcg atcgaaggcg gctgggccgg 960 

gagttgcaga tggcccgggg tctctactgg gggttcggtg ccaacggcga tctgtactcg 1020. 

atgctcctgt ccggacggga cgacgacccc tggacctggt acgaacggtt gcgggccgcc 1080 

ggacggggac cgtacgccag tcgggccgga acgtgggtgg tcggtgacca ccggaccgcc 114 0 

gccgaggtgc tcgccgatcc gggcttcacc cacggcccgc ccgacgctgc ccggtggatg 1200 

caggtggccc actgcccggc ggcctcctgg gccggcccct tccgggagtt ctacgcccgc 1260 

accgaggacg cggcgtcggt gacagtggac gccgactggc tccagcagcg gtgcgccagg 1320 

ctggtgaccg agctggggtc gcgcttcgat ctcgtgaacg acttcgcccg ggaggtcccg 1380 

gtgctggcgc tcggtaccgc gcccgcactc aagggcgtgg accccgaccg tctccggtcc 14 40 

tggacctcgg cgacccgggt atgcctggac gcccaggtca gcccgcaaca gctcgcggtg 1500 

accgaacagg cgctgaccgc cctcgacgag atcgacgcgg tcaccggcgg tcgggacgcc 1560. 

gcggtgctgg tgggggtggt ggcggagctg gcggccaaca cggtgggcaa cgccgtcctg 1620 

gccgtcaccg agcttcccga actggcggca cgacttgccg acgacccgga gaccgcgacc 1680 

cgtgtggtga cggaggtgtc gcggacgagt cccggcgtcc acctggaacg ccgcaccgcc 17 4 0 

gcgtcggacc gccgggtggg cggggtcgac gtcccgaccg gtggcgaggt gacagtggtc 1800 

gtcgccgcgg cgaaccgtga tcccgaggtc ttcaccgatc ccgaccggtt cgacgtggac 1860 

cgtggcggcg acgccgagat cctgtcgtcc cggcccggct cgccccgcac cgacctcgac 1920 

gccctggtgg ccaccctggc cacggcggcg ctgcgggccg ccgcgccggt gttgccccgg 1980 

ctgtcccgtt "ccgggccggt gatcagacga cgtcggtcac ccgtcgcccg tggtctcagc 204 0 

cgttgcccgg tcgagctgta gaggaagaac gatgcgcgtc gtgttttcat cgatggctgt 2100 

caacagccat ctgttcgggc tggtcccgct cgcaagcgcc ttccaggcgg ccggacacga 2160 

ggtacgggtc gtcgcctcgc cggccctgac cgacgacgtc accggtgccg gtctgaccgc 2220 

cgtgcccgtc ggtgacgacg tggaacttgt ggagtggcac gcccacgcgg gccaggacat 2280 

cgtcgagtac atgcggaccc tcgactgggt cgaccagagc cacaccacca tgtcctggga 234 0 

cgacctcctg ggcatgcaga ccaccttcac cccgaccttc ttcgccctga tgagccccga 2400 

ctcgctcatc gacgggatgg tcgagttctg ccgctcctgg cgtcccgact ggatcgtctg 2460 

ggagccgctg accttcgccg ccccgatcgc ggcccgggtc accggaaccc cgcacgcccg 2520 

gatgctgtgg ggtccggacg tcgccacccg ggcccggcag agcttcctgc gactgctggc 2580 

ccaccaggag gtggagcacc gggaggatcc gctggccgag tggttcgact ggacgctgcg 2640 

gcgcttcggc gacgacccgc acctgagctt cgacgaggaa ctggtgctgg ggcagtggac 2700 

cgtggacccc atccccgagc cgctgcggat cgacaccggc gtccggacgg tgggcatgcg 2760 

gtacgtcccc tacaacggcc cctcggtggt gcccgcctgg ctgttgcggg aacccgaacg 2820 

tcggcgggtc tgcctgaccc tcggcggttc cagccgggaa cacggcatcg ggcaggtctc 2880 

catcggcgag atgttggacg ccatcgccga catcgacgcc gagttcgtgg ccaccttcga 2940 

cgaccagcag ttggtcggcg tgggcagcgt tccggcaaac gtccgtaccg ccgggttcgt 3000 

gccgatgaac gtcctgctgc ccacctgcgc ggccaccgtg caccacggcg gcaccggcag 3060 

ttggctgacc gccgccatcc acggcgtacc gcagatcatc ctctcggacg ccgacaccga 3120 

ggtgcacgcc aagcagctcc aggacctcgg cgcggggctg tcgctcccgg tcgcggggat 3180 

gaccgccgag cacctgcgtg gggcgatcga gcgggttctc gacgagccgg cgtaccgcct 3240 

cggtgcggag cggatgcggg acgggatgcg gaccgacccg tcgccggccc aggtggtcgg 3300 

catctgtcag gacctggccg ccgaccgggc ggcacgcggc aggcagccgc gtcgaaccgc 3360 

cgagccgcac ctgccgcgat gacttccacc accaccggga ccggctgatg ccggtcccgg 3420 

aatccacacg ccgactttcc ttctgacacg agggggcccc ggtggttacc tccaccaact 34 80 

tggacacgac agcacggccg gcactgaact cgttgaccgg gatgcggttc gtcgccgcct 3540 

tcctggtctt cttcacgcac gtcctgtcga ggctcatccc gaacagctac gtgtacgccg 3600 

acggcctgga cgccttctgg cagaccaccg gacgggtggg ggtgtcgttc ttctttattc 3660 

tcagcggttt cgtgctgacc tggtcggcgc gggccagcga ctcggtgtgg tcgttctggc "3120- 

gcagacgggt ctgcaagctc ttccccaacc acctggtcac cgccttcgcc gccgtggtgt 3780 
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tgttcctggt 

tccacgcctg 

tggcctgcga 

tccgcccgga 

cggtggtcgc 

ccgccatcca 

tcgggatcat 

ccgcggtgct 

ccatctcctc 

ccgacctcca 

tctccttcgc 

ggttcagcca 

tcgcggtctc 

gtaactgggc 

cttcccgccg 

gagtacgaga 

agcctgatcc 

atcgcgcact 

gtaggtgtcg 

ctggccatcg 

ctcatggaca 

gtgcacctgt 

ggcctcaagc 

gccgggacga 

tacggcgacg 

ctgcggtact 

cgcctcgacg 

gtcgcgggtc 

tcgcacggcc 

gtcgtccgcc 

tccctgaaca 

gtcgcgtcgg 

atgtacccct 

atcaccgggc 

catgccgaac 

catctaccac 

cgtggaggtc 

gaccggatcc 

gtcggccgcc 

cgacatgcgc 

caccggttac 

cctcgcgcct 

cggctgggtc 

caccgtcccg 

ggggtcaccg 

ccgcgccgcc 

cgacctgttc 

gaggagctgg 

ggggtgttcg 

ttcccggtgg 

ttcacgacga 

gacttcgccg 

ctctccgccg 

tccctggagg 

gaacgggcgg 

ctcgtcatgt 

atcctgcccg 

cggccgggcg 

cacccacgcc 

cgtgccGtcc 

agcgctcgcg 



caccgggcag 
gttcccggcc 
ggcgttcttc 
gcggctgtgg 
cgacctcctg 
ggactggttc 
cctggcccgc 
gttgttcccg 
gtcgatgatg 
gcagaagcgc 
gctctacatg 
gaccgaggac 
cctggtgctg 
ccgcccggcc 
gtaagaagga 
gggaacgagc 
tcggtcagag 
gcgtgggcgt 
gacgcgacga 
acgagatcgg 
ccgacctggt 
acgggcagtg 
tcgtggagga 
tgagcgacgc 
gcggcgcggt 
acgggatgga 
aggtgcaggc 
ggcgggcggt 
tcgaactccc 
acccgcgccg 
tcagctaccc 
'ggtcgctgcc 
ccctccctca 
tgtgacgagc 
agccactcga 
gacttctacc 
gcccgcaagc 
cacctggtcg 
atgctcgcca 
gacttctccc 
ctcgtcgacg 
ggcggcaccc 
ggggccgacc 
gcgggtctgc 
gaggccggga 
tacgagcagg 
tcgccgggcc 
gcatcgaggg 
gcacggcgta 
cccaggtcag 
tgcccggctc 
tcgacatccg 
agtcgatggt 
acgacaccac 
tgcaccccct 
ccgagcggga 
actacgccgc 
tgcgggccgg 
ctggccgacc 
ggtgccgtcg 
gaggtggtcg 



gcggtgagcg 

ctggagatct 

tacctgtgct 

gcctgggccg 

ctgccgagtt 

ctctacacct 

atcctgatca 

gtcttcttcg 

atccttcccc 

accttcatgc 

gtccacttcc 

gccccgctgg 

tcgtggctgc 

tccgcccggc 

cggtgcatcg 

cgacatcctc 

tgtggagaac 

cgacaacggc 

cgaggtcgtc 

cgcccggccg 

ggaggcggcg 

cgtggacatg 

ctgcgcccag 

ggcggccttc 

cgtcaccaac 

ggaggtctac 

cgagatcctg 

cgcccagcgg 

agtggtcacc 

cgacgagatc 

ctggccggtg 

ggtcaccgaa 

cgacctgcag 

ccgcgtgtcg 

ccacgtcgag 

acggccgtgg 

acaccccaca 

agctggcgga 

ccgccgcccg 

tcgaccgcag 

aggccgaact 

tcgtcgtgga 

tggtcaccag 

ccgaccgcac 

tcgagcactt 

ccttccagcg 

ttttcgtcgg 

ggtcttcacc 

ccaggaggac 

caccacccgg 

catggcgaag 

gcccggttcc 

cgggctgtac 

cctcgtctac 

ggatccggag 

ccgggtcgca 

ctgccgggcc 

tggtggtgct 

tcccggtgcg 

ccgactacga 

cggacgcccg 



gtgaggcgct 

ccttcggcat 

tcccgctgtt 

ccgtggtgtt 

ccccgccgct 

tccctgcgac 

ccggtcggtg 

tcgcctcgct 

tggttctgat 

gtaaccgggt 

tggtgatcgt 

gtctcgcact 

tgtacaggtt 

gcaaacccgc 

gtgaccacct 

gatgcggtgc 

ttcgagaccg 

accaacgctg 

acggtctcca 

gtcttcgtgg 

gtcaccccgc 

acagccctgc 

gcccacggtg 

tcgttctacc 

gacgacgaga 

tacgtcaccc 

cggcgcaaac 

tacgtcgacg 

gacggcaacg 

atcaagcgtc 

cacaccatga 

cggctggccg 

gacagggtga 

tcagcgaaga 

caccgacgtc 

caagggatac 

ggcggcgacc 

cagcttccgg 

caacgacccc 

gttcgacgtc 

ggaccgtgcc 

gccctggtgg 

cggtgaccgg 

cgcctcccgg 

caccgaggtg 

ggcgggcctg 

ggtcgccgcg 

ttcaccccgc 

gtgttcgtgg 

tcccggcggg 

tacgtctact 

ccgaccttcg 

cttcccgtgg 

ctgatgtccg 

ctggcgttgc 

cccaccctcc 

gccgcgcacc 

cggcgcgtcg 

ggtgcggctc 

gacgcaccgg 

ggcggtcttc 



gatcccgaac 

caacccggtg 

cctgttctgg 

cgccgcgatc 

gatcccgggg 

gcggagcctg 

gatcaacgtc 

cttcctgccg 

catcgccagc 

gatggtgtgg 

ctacggggcg 

cttcatgatc 

cgtcgagcta 

cacggaaccc 

acgtctggtc 

agaaggtctt 

agtacgcccg 

tgaaactcgc 

acaccgccgc 

acgtccgcga 

gtaccaaggc 

gggaactggc 

cccggcggga 

cgacgaaggt 

cagcccgcgc 

ggaccccggg 

tgacccggct 

ggctcgccga 

aacacgtctt 

tccgggacgg 

ccggcttcgc 

gcgagatctt 

tcgaggcggt 

cccactctgg 

gccccgtacg 

cgtgccgaag 

ctgctggacg 

gaggtggtgg 

gggcgggaac 

gtcacctgca 

gtggcgaacc 

ttcccggaga 

aggatctccc 

atgaccatcc 

cacgtgatga 

agctgctcgt 

gagccggggc 

agacgttcgc 

cggcgctcgg 

gtgtggtccg 

gcgccagggg 

gccgggccga 

gcatgggcca 

ccggttacgt 

cgatcccggc 

gggaggcccg 

gggtggtgcg 

ggtttcctgg 

gtcgcccggc 

gtggacctca 

ccgttcgccg 



ctcctgctga 

agctggtcgt 

atctccggta 

tgggcggtac 

cttgagtact 

gagttcatcc 

gggctgctcc 

ggtgtctacg 

ggcgcgacgg 

ctcggcgacg 

gacctgctgg 

attccgttcc 

cccgtcatgc 

gaacagaccc 

ctatctgttg 

cgccagtggc 

ctaccacggg 

gctggagtcg 

ccccacagtc 

cgaggactac 

catcgtcccg 

cgaccggcgg 

cggtcggctg 

cctcggcgcc 

cctgcgacgg 

tcacaacagc 

cgacgcgtac 

cctccaagac 

ctacgtgtac 

gtacgacatc 

ccacctcggt 

ctcccttccc 

gcgggaggtc 

aagggccggt 

agcgggcgga 

ccgacgcgct 

tggcctgcgg 

gggtcgacct 

tgcaccaggg 

tgttcagctc 

tggccggtca 

cgttccggcc 

ggatgtcgca 

actacacggt 

ccctgttcgc 

acgtcggcca 

ggtgagggtc 

cgacgagcgg 

ccgcccgctg 

gggggtgcac 

tagggcgatg 

gccggtcgag 

cctgttcgtc 

ccccgacaag 

cgacctcgac 

ggaccagggg 

gacgtgaccc 

gttcggcggt 

gggaggtcgt 

ccgaacccgg 

cccagatcag 



3840 
3900 
3960 
4020 
4080 
4140 
4 200 
4260 
4320 
4380 
4440 
4500 
4560 
4 620 
4680 
4740 
4800 
4860 
4920 
4980 
5040 
5100 
5160 
5220 
5280 
5340 
5400 
5460 
5520 
5580 
5640 
5700 
5760 
5820 
5880 
5940 
6000 
6060 
6120 
6180 
624 0 
6300 
6360 
6420. 
6480 
6540 
6600 
6660 
6720 
6780 
6840 
6900 
6960 
7020 
7080 
7140 
7200 
7260 
7320 
--^380 
74 40 
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gggtacgtca gggtggcgga tcagcgagga cgacgtggtc gccgaacgga cgaacgtcgg 7500 

cctggtccgg gacctgatcg ccgtcctgtc ccgctcgccg cacgccccgg tggtggtctt 7560 

cccgggcagc aacacgcagg tcggcagggt caccgccggc cgggtcatcg acggcagcga 7 620 

gcaggaccac cccgagggcg tctacgacag gcagaaacac accggggaac agctgctcaa 7 680 

ggaggccact gcggccgggg cgatccgggc gaccagtctg cggctgcccc cggtgttcgg. 7740 

ggtgcccgcc gccggcaccg ccgacgaccg gggggtggtc tccaccatga tccgtcgggc 7800 

cctgaccggc caaccgctga cgatgtggca cgacggcacc gtccggcgtg aactgctgta 78 60 

cgtgaccgac gccgcccggg ccttcgtcac cgccctggac cacgccgacg cgctcgccgg 7920 

acgccacttc ctgttgggga cggggcgttc ctggccgctg . ggcgaggtct tccaggcggt 7980 

ctcgcgcagc gtcgcccggc acaccggcga ggacccggtg ccggtggtct cggtgccgcc 8040 

tccggcgcac atggacccgt cggacctgcg cagcgtggag gtcgaccccg cccggttcac 8100 

ggctgtcacc gggtggcggg ccacggtcac gatggcggag gcggtcgacc ggacggtggc 8160 

ggcgttggcc ccccgccggg ccgccgcccc gtccgagccc tcctgaccgg ggtcacccgg 8220 

gttcgtccta cggcaccggc ccgtcgacgg ccggtgccgg gaagatcgct tcgagttccc 8280 

ggagttcctc ctcgcccagc gtcagctcgg cggcccgtaa cgccgagtcg agctgctcgg 834 0 

gtgtgcgggg gccgatgaca gcgcccagga tcccggggcg ggacaggacc caggccagac 8400 

cgacctcggc cgggtccgcg ccgaggcgtc ggcagtagtc ctcgtacgcc tcgacgaggg 84 60 

ggcgtacggc ggggaggagc acctgggcgc gtccctgcgc cgacttgacg gcggttccgg 8520 

ctgccaactt ctccagtacg ccgctgagca gcccgccgtg caggggggac caggcgaaca 8580 

cgcccacccc gtacgcctgg gcggcgggca ggacgtccag ctcggggtgg cggacggcca 864 0 

ggttgtacag gcactggtgg gagatcatgc cgagcaggtt gcggcgtgcc gcgctctcct 8700 

gggcggcggc gatgtgccag cccgccaggt tggaggagcc gacgtacccg accttcccac 87 60 

tgccgaccag atgttcggcg gcctgccaca cctcgtccca cggtgcggcg cggtcgatgt 8820 

ggtgcgtctg gtagatgtcg atgtggtcga ccccgaggcg gcggagggag ttctcgcagg 8880 

cggcgacgat gtgtcgggcg gagagcccgc cgtcgttgac ccgttcgctc atctcgctgc 8940 

ccaccttggt cgccaggacg gtctcctcgc gtcgacctcc gccctgggcg aaccaccgtc 9000 

cgacgagttc ctcggtgtgg cccttgtaga gccgccagcc gtagatgtcg gcggtgtcga 9060 

tgcagttgac gccccgctcg agggcgtggt ccatcagccg cagcgcgtcg tcgtcggtca 9120 

cccgtccact gaagttcacg gtgccgagcc agagtcggct ggtgtgcaac gccgatcgtc 9180 

cgacgcgtac ccgggcggac ccggccccgg tggttcccac gtcggtcacc tgtcggcgcg 9240 

gtgctggtgg gcgagcgcct ccagcacggg tacgacctcg gcgggggtcg gcgcggccag 9300 

cgcctcctgc 'cgcagcttct cggcgttctc ggcgtgggaa cggtcctcga ccactgtggc 9360 

gagagcctgc cagagggtgt cggcgtcgac ctcgtccgga cggaggaaga cacccgctcc 9420 

cagctcggcg gtgcgctgac cacgcaggac acagtcccac tcgtgggcga cggagatctg 9480 

cggtacgccg tggtgcagcg cggtggccca gcttccggca ccgccgtggt ggatgacggc 954 0 

ggcacagccc ggcagcagga tgttcatggg aacgaagtcc accaggcgga cgttgtccgg 9600 

caccgacgcc ggatcgagcc cggagcgggt caccacgatc tcgccgtcga accgcgcgag 9660 

ggtggccagt gtccggagga actcctgcgg gttcgaggtg atgcccagcg ccgag'tatcc 9720 

cccggtgaag cagacccggc ggactccgtc cgaggtcctg agccactgcg gcacgacgga 9780 

ggacccgttg tagggcaaag tccgggtgtg caccgactcc agtccggtct ccaggcggaa 984 0 

gctctcgggc agctggtcga cgctccactg tccgacagcg aggtcctcgc tgtagtcgag 9900 

gccgaaccgg ccggcgacct cggtgagcca gccgccgagc gggtccggcc ggtcgtcggc 9960 
gggacgctgc ccgcgcaggt cctgggagcg gctgcggaag tagccggtga ggtcgctgcc . 10020 

ccacagcagc cgggcgtggg cggccccgca ggccttggcc gcgaccgccc cggcgaaggt 10080 

gaagggctcc cagagcacca ggtcgggacg ccagtccatg gcgaactcga cgagttcgtc 1014 0 

gacgaaggag tcgttgttga ccaccgggaa gacgaaccgg gaggtggcct cctcgatgcc 10200 

gtgcaggaac tcccacgagc gcagttccgg tccgcgtcgg gcgaagtcca ggtcggtggt 102 60 

gtagcggtgc acctgcgcgg cggcctcagg ggagatgtcg aagagtcggt ggtccgagcc 10320 

gagtggcacc gaggtcagtc ccgcgccgac gacgacgtcg gtgagctcgg gctgactggc 10380 

cacccggacg tcgtggccgg cggtgtgcag cgcccaggcc agggggacga ggccctggaa 104 4 0 
gtgggtacgg tgcgcgaacg aggtgagcag gacccgcact ggtcactcct tggtcgagat . 10500 

gagggcggca acggtccggt cgatgccctc ggccagcggc acccgggggt gccagccggt 10560 

cagcgtccgg aactcggtgg agtcgaagtc gtcgctgcgg aagtcgttgg cctcggcgtt 10620 

ctccggtgga gggacgctga cgacgggcac cgcagggttg ccggtctgac gtgccacgct 10680 

ggcggcgacg gtctcgaaga tctcgccgag gggtcgggcc tcgtccgcgc tcggcgtcca 1074 0 

gacgtcgccg accagcgcct cgtggttgtg cagtgcggcg gtgaacgcgg tggccacgtc 10800 

ctcgacgtgc aggaggttgc ggcgcacgct gccctcgtgc cacatcgtga tcggctcacc 10860 

ggcgagggct cgccggatca tggcggtgac gacaccccgg ccggtctgcc ccgacgggcc 10920 

gctgtggccg tagatcgcgg gcaggcgcag gatcaccccg tcgacgaccc cgtcctcggt 10980 

ggcctgacgc aggatccgct cggcctcgat cttgtgctgg gcgtaccggc tgggggcggc 14.04 0 

ggggttcgcg gcctgggtgg tgctggcgaa caggagcacc ggcgcgggtc cgggtcttgc 11100 
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ccgcagcgcg gcgacgaggt cgcgcatgat gcccgcgttg acgcgttcgg cctcgggcac 
cgtggcggcg ctgcgccagg tcgacccgcc ggcggcgtag gcgaccagat gcacgacgac 
gtcggtgtcg gcgacgacct gcgcgacccg gccgggttcg agcaggtcga ctcgaaggtg 
ctcgatcccg gcgctgcctg gtggctggtc gcgagacccg gtgcgcgcga cggcccgcag 
tcggagaggg tgtgtggtaa attcgcgaag aagggcgctt ccgacgaatc cagaaacgcc 
gagaagtgtg acatgtcttg "tcatctacta atgcattccg atagccaccg gcgcatggaa 
tccatttgtt ccccccaggg tggtgtcggg tgacaaatcc ggcctcaggt cggcctcaag 
cctctttcga gcgggtgctg aggcttcccg cgtaccctcg gtggcctgcg ttcgggcggg 
tgtcggggaa agggcggatc gaggagttcg gtagggcgtc gcggcgcgta ctccgggact 
gatccgggtc gacgccccga cgcgtgacag ggcgtcgatc cgtgccgccc gtaccgccgg 
ttttcggcga tggtcgcaga ttcctcccga cgtggtggac tcattggttc tcccgggtgt 
ggccgcaccg tcggtggcct cgtcgggggt gtcggagacc gggtcgatcg ccgtccccgg 
ccgtgccgac cagggtcggt ccgtcgccga ggtgggtcac cgtcgggtgg acccggtccg 
ccggcggcca ccgcccgatc gtgcccacct tcgcctccgc gggtaaatgc ttcgtcgatc 
tgatcgacac ttccggcgac gctatcaccg gagcattccc cggcaccacc ggtcgatgcc 
tcgcgctttc caaacaggga aaacagcagc tcacagcggt tccaggcgcc gggcaatcct 
agcgaagagt ctcgatgggg tcaaggtgaa ttctgtcaca gatgtttttg ttaaatgtac 
tttcttcagc caccctcgac gttcatacaa ttggccggca tctctaccaa gggggagtga 
gtggttgacg tgcccgatct actcggcacc cggactccgc acccagggcc gctcccattc 
ccgtggcccc tgtgcggtca caacgaaccg gagctgcggg cccgcgcccg tcaattgcac 
gcatatctcg aaggcatttc cgaggatgac gtggtggccg tcggcgccgc cctcgcgcgc 
gagacacgcg cgcaggacgg gccgcaccgc gccgtcgtcg tggcctcctc ggtcaccgag 
ctgaccgccg cgctcgccgc cctcgcccag ggccgcccac acccctcggt ggtacgcggt 
gtcgcccgac ccacggcacc ggtggtgttc gtcctgcccg gtcagggcgc ccagtggccc 
ggcatggcga cccgactgct cgccgagtcg cccgtcttcg ccgcggcgat gcgggcctgc 
gagcgggcct tcgacgaggt caccgactgg tcgttgaccg aggtcctgga ctcacccgag 
cacctgcgcc gcgtcgaggt ggtccagccc gcgctcttcg cggtgcagac ctcactggcc 
gccctgtggc ggtcgttcgg ggtgcgaccc gacgccgtac tcggacacag catcggtgag 
ctggccgccg ccgaggtctg cggcgccgtc gacgtcgagg ccgccgcgcg ggccgccgcc 
ctgtggagcc gcgagatggt cccactggtg ggccggggtg acatggcggc ggtggcgctc 
tccccggccg agctggcagc ccgggtcgag cggtgggacg acgacgtcgt gccggccggg 
gtcaacggtc 'cccggtcggt gctgctcacc ggcgctcccg agcccatcgc acggcgggtc 
gccgagctgg cggcacaggg cgtacgcgcc caggtcgtca acgtgtcgat ggcggcgcac 
tcggcgcagg tcgacgccgt cgccgagggc atgcgctcgg cgctgacctg gttcgccccc 
ggcgactccg acgtgcccta ctacgccggc ctcaccggcg ggcggctgga cacccgggaa 
ctcggcgccg accactggcc gcgcagtttc cggctcccgg tgcgcttcga cgaggcgacc 
cgtgcggtcc tggaactgca gcccggcacg ttcatcgagt cgagcccgca cccggtgctg 
gcggcctccc tgcagcagac cctcgacgag gtcgggtccc cggccgcgat cgtgccgacc 
ctgcaacgcg accagggcgg tctgcggcgg ttcctgctcg ccgtggcgca ggcgtacacc 
ggtggcgtga cagtcgactg gaccgccgcc taccccgggg tgacccccgg ccacctgccg 
tcggccgtcg ccgtcgagac cgacgaggga ccctcgacgg agttcgactg ggccgcgccc 
gaccacgtac tgcgcgcgcg gctgctggag atcgtcggcg ccgagacgg.c cgcgctcgcc 
gggcgggagg tcgacgcccg ggccaccttc cgggaactgg gcctcgactc ggtcctcgcg 
gtgcagctgc ggacccgcct cgccacggcg accgggcggg atctgcacat cgccatgctc 
tacgaccacc cgaccccgca cgccctcacc gaggcgctgc tgcgcggccc gcaggaggag 
ccggggcggg gtgaggagac ggcacacccg acggaggccg aacccgacga acccgtcgcc 
gtggtcgcca tggcgtgccg gctgcccggc ggcgtcacct caccggagga gttctgggag 
ctgctggccg aggggcggga cgccgtcggc gggctgccca ccgaccgggg atgggacctg 
gactcgctgt tccacccgga cccgacccgg tcgggcacgg • cgcaccagcg cgctggtggc 
ttcctcaccg gcgccacctc cttcgacgct gccttcttc'g ggctgtcgcc acgggaggca 
ctggccgtcg agccgcagca gcggatcacg ttggagctgt cgtgggaggt gctggaacgc 
gccgggatcc ccccgacgtc gttgcggacc tcccggaccg gggtgttcgt cggtctgatc 
ccccaggagt acggcccccg gctggccgag gggggtgagg gcgtcgaggg ctacctgatg 
accgggacca ccaccagcgt cgcctccggt cgggtcgcct acaccctcgg cctggagggg 
ccggcgatca gcgtcgacac cgcctgctcg tcgtcgctcg tcgccgtgca cctggcgtgc 
cagtcgctgc ggcgcggcga gtcgacgatg gcgctcgccg gtggcgtgac ggtgatgccg 
acaccgggca tgctcgtgga cttcagtcgg atgaactccc tcgcccccga cggacggtcc 
aaggcgttct cggccgccgc cgacgggttc ggcatggccg aaggcgcagg gatgctcctg 
ctggaacggc tctcggacgc ccgccgccac ggccacccgg tgctcgccgt gatcaggggc 
accgctgtca actccgacgg cgcgagcaac ggactctccg ccccgaacgg ccgggcccag 
gtccgggtga tccgacaggc cctcgccgag tccgggctga cgccccacac cgtcgacgtc 
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gtggagaccc acggcaccgg cacccgcctc ggtgatccga tcgaggcacg ggcgctctcc 
gacgcgtacg gcggtgaccg tgagcacccg ctgcggatcg gctcggtcaa gtccaacatc 
gggcacaccc aggccgccgc cggtgtcgcc ggtctgatca aactggtgtt ggcgatgcag 
gccggtgtcc tgccccgcac cctgcacgcc gacgagccgt caccggagat cgactggtcc 
tcgggcgcga tcagcctgct ccaggagccc gctgcctggc ccgccggcga gcggccccgc 
cgggccgggg tgtcctcgtt cggcatcagc ggcaccaacg cacacgcgat catcgaggag 
gcgccgccga ccggtgacga cacccgaccc gaccggatgg gcccggtggt gccctgggtg 
ctctcggcga gcaccggcga ggcgttgcgc gcccgggcgg cgcggctggc cgggcaccta 
cgcgagcacc ccgaccagga cctggacgac gtcgcctact cgctggccac cggtcgggcc 
gcgctggcgt accgtagtgg gttcgtgccc gccgacgcgt ccacggcgct gcggatcctc 
gacgaactcg ccgccggtgg atccggggac gcggtgaccg gcaccgcccg cgccccgcag 
cgcgtcgtct tcgtcttccc cggccaggga tggcagtggg cggggatggc agtcgacctg 
ctcgacggcg acccggtctt cgcctcggtg ctgcgggagt gcgccgacgc gttggaaccg 
tacctggact tcgagatcgt cccgttcctg cgggccgagg cgcagcgccg gacccccgac 
cacacgctct ccaccgaccg cgtcgacgtg gtccagccgg tgctgttcgc ggtgatggtg 
tccctggcgg cccggtggcg ggcgtacggg gtggaaccgg cggccgtcat cggacactcc 
cagggggaga ttgccgcggc gtgtgtggcc ggggcgctct cgctggacga cgcggcccgg 
gcggtggccc tgcgcagccg ggtcatcgcc accatgcccg gcaacggcgc gatggcctcg 
atcgccgcct ccgtcgacga ggtggcggcc cggatcgacg ggcgggtcga gatcgccgcc 
gtcaacggtc cgcgcgcggt ggtggtctcc . ggcgaccgtg acgacctgga ccgcctggtc 
gcctcctgca ccgtcgaggg ggtgcgggcc aagcggctgc cggtggacta cgcgtcgcac 
tcctcgcacg tcgaggccgt ccgtgacgcg ctccacgccg aactcggcga gttccggccg 
ctgccgggct tcgtgccgtt ctactcgaca gtcaccggcc gctgggtcga gcccgccgaa 
ctcgacgccg ggtactggtt tcgcaacctg cgccacaggg tccggttcgc cgacgcggtc 
cqctccctcg ccgaccaggg gtacacgacg ttcctggagg tcagcgccca cccggtgctc 
accacggcga tcgaggagat cggtgaggac cgtggcggtg acctcgtcgc tgtccactcg 
ctgcgacgtg gggccggcgg tcccgtcgac ttcggctccg cgctggcccg cgccttcgtg 
gccggcgtcg cagtggactg ggagtcggcg taccagggtg ccggggcgcg tcgggtgccg 
ctgcccacgt acccgttcca gcgtgagcgc ttctggttgg aaccgaatcc ggcccgcagg 
gtcgccgact ccgacgacgt ctcgtccctg cggtaccgca tcgaatggca cccgaccgat 
ccgggtgagc cgggacggct cgacggcacc tggctgctgg cgacgtaccc cggtcgggcc 
gacgaccggg tcgaggcggc gcggcaggcg ctggagtccg ccggggcgcg ggtcgaggac 
ctggtggtgg agccccggac gggccgggtc gacctggtgc ggcggctcga cgccgtgggt 
ccggtggcgg gcgtgctctg cctgttcgct gtcgcggagc cggcggccga acactccccg 
rr.ggcggtga cgtcgttgtc ggacacgctc gacctgaccc aggcggtggc cgggtcgggc 
cqggagtgtc cgatctgggt ggtcaccgag aacgccgtcg ccgtcgggcc cttcgaacgg 
ctccgcgacc cggcccacgg cgcgctctgg gccctcggtc gggtcgtcgc cctggagaac 
cccgccgtct ggggcggcct ggtcgacgtg ccgtcgggtt cggtcgccga gctgtcgcgt 
cacctcggga cgaccctgtc cggcgccggc gaggaccagg tcgccctccg acccgacggg 
acgtacgccc gccggtggtg cagggcgggc gcgggcggca cgggccggtg gcagccccgg 
ggcacggtgc tcgtcaccgg cggcaccggc ggggtcggtc ggcacgtcgc ccggtggctg 
qrccgccagg gcaccccgtg cctggtgctg gccagccgcc ggggaccgga cgccgacggg 
gtcgaggagc tactcaccga actcgccgac ctgggcaccc gggccaccgt caccgcctgc 
gticgtcaccg accgggagca gctccgtgcc ctcctcgcga ccgtcgacga cgagcacccg 
ctgtcggcgg tgttccacgt cgccgcgacg ctcgacgacg gcaccgtcga gaccctcacc 
ggtgaccgca tcgaacgggc caaccgggcg aaggtgctcg gtgcccgcaa cctgcacgag 
ctgacccggg acgccgacct cgacgcgttc gtgctcttct cctcctccac cgccgcgttc 
ggcgcgccgg ggctcggcgg ctacgtcccg ggcaacgcct acctcgacgg tctcgcccag 
cagcgacgca gcgagggact cccggccacc tcggtggcgt ggggtacctg ggcgggcagc 
gggatggccg agggtccggt cgccgaccgg ttccgccggc acggggtcat ggagatgcac 
cccgaccagg ccgtcgaggg tctccgggtg gcactggtgc agggtgaggt agccccgatc 
gtcgtcgaca tcaggtggga ccggttcctc ctcgcgtaca ccgcgcagcg ccccacccgg 
ctcttcgaca ccctcgacga ggcccgtcgg gccgcgcccg gtcccgacgc cgggccgggg 
gtggcggcgc tggccgggct gcccgtcggg gaacgcgaga aggcggtcct cgacctggta 
cggacgcacg cggctgccgt cctcggccac gcctcggccg agcaggtgcc cgtcgacagg 
gccttcgccg aactcggcgt cgactcgctg tcggccctgg aactgcgcaa ccggctgacc 
actgcgaccg gggtccggct ggccacgacg acggtcttcg accacccgga cgtacggacc 
ctggccggac acctggccgc cgaactgggc ggcggatcgg ggcgggagcg gcccgggggc 
gaggccccga cggtggcccc gaccgacgag ccgatcgcca tcgtcgggat ggcctgccgg 
ctgccggggg gagtggactc accggagcag ctgtgggagt tgatcgtctc cgggcgggac 
accgcctcgg cggcacccgg ggaccggagc tgggatccgg cggagttgat ggtctccgac 
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acgacgggca 

gcgttcttcg 

ctggagacca 

acggacaccg 

cccgaggacg 

cggatcgcgt 

tcgtcgcttg 

gcggtggcgg 

cagggcgcgt 

ggtctggggg 

gggcgtcggg 

gggttggcgg 

gcgggtgtgt 

ggggatccgg 

ggtccggtgg 

gtggtgggtg 
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ggggtgcggg 

ggggtgtcgg 

gcggaacggc 

ccggtggtgc 

gaccacctgg 
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tcgtcggtgg 

ttggatcggg 

ttgtggcggt 

gcggcggcgg 

cgggcgcggg 

cgcgacgacg 

gcggtcaacg 

gtcgagcact 

cactccgcac 

ggccgcccgg 

gaactggacg 

gtcgaggcgc 

ctgtcgatgg 

ctggaacgcg 

cacggcgtac 

acctatccct 

gtcgccgact 

ctcgacggtc 

gaggtgcggg 

gtcaccgacc 

ggtgcggccg 

ctgtgggtgg 

gcgacggtgt 

ctgctggatc 

gccggtgccg 

cccaccccgg 

gggggcaccg 

cacctcgccc 

gacctgaccg 

tcggtcggcg 

cacgctgccg 

gacgtggtgg 

gaactgttcc 

tacgccgccg 

cccgccacct 



cccgtaccgc 
ggatctcgcc 
cctgggaggc 
gtgtcttcgt 
aggtcgacgg 
acgtgttggg 
tggcgttgca 
gtggggtgtc 
tggctccgga 
aggggtcggc 
tgttgggtgt 
cgccgtcggg 
cgggtgggga 
tggagttggg 
tggtgggttc 
tgatcaaggt 
ggttgtcggg 
ggtggccggt 
ggacgaatgc 
cggtggaggg 
tgtcggcaaa 
agacgcaccc 
gcttcgacag 
gcggcctcgc 
gtgtggtgtt 
tgtcggttcc 
tggggttttc 
tggatgtggt 
ggtgtggggt 
tggtggcggg 
cgttgcgggc 
'tacagaagct 
gccccgacgc 
gtgacgggat 
aggtcgagtc 
cgacggtgcc 
ccgactactg 
tggcagcgcg 
cggtcgggga 
acaccgacga 
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gcggtgtcgt tcctgcgtga gcggggcgta cggccgatgt cggtgccgag ggcactggaa 
gcgctggaac gggtcctcac cgccggggag accgcggtgg tcgtcgccga cgtcgactgg 
gcggccttcg ccgagtcgta cacctccgcc cggccccggc cgctgctcca ccggctcgtc 
acacctgcgg cggcggtcgg cgagcgcgac gagccgcgtg agcagaccct ccgggaccgg 
ctggcggccc tgccccgggc cgagcggtcg gcggagctgg tacgcctggt ccggcgggac 
gccgcagccg tgctcggcag cgacgcgaag gccgtacccg ceaccacgcc gttcaaggac 
ctcgggttcg actcgctggc cgcggtccgg ttccgtaacc ggctggccgc ccacaccggt 
ctgcgtctgc cggccaccct ggtcttcgag cacccgaacg ccgcagccgt cgccgacctc 
ctccacgacc gactcggcga ggccggcgag ccgacccccg tccggtcggt gggcgccgga 
ctggccgcgc tggagcaggc cctgcccgac gcctccgaca cggagcgggt cgagctggtc 
gagcgcctgg aacggatgct cgccgggctc cgccccgagg ccggagccgg ggccgacgcc 
ccgaccgccg gtgacgacct gggggaggcc ggcgtcgacg aactcctcga cgcgctcgaa 
cgggaactcg acgccaggtg aacccgaact gaccgcagcc gcagccgaag cagagaccga 
ggacctgtga ctgacaacga caaggtggcg gagtacctcc gtcgtgcgac gctcgacctg 
cgggccgccc gcaagcgcct gcgcgagctg caatccgacc cgatcgcggt cgtcggcatg 
gcctgccgcc taccgggcgg ggtgcacctc ccgcagcacc tgtgggacct cctgcgccag 
gggcacgaga cggtgtccac cttccccacc gggcgcggct gggacctggc cgggctcttc 
cacccggacc ccgaccaccc cggcaccagc tacgtcgacc ggggtgggtt cctcgacgac 
gtggcgggct tcgacgccga gttcttcggg atctccccgc. gcgaggccac ggccatggac 
ccgcaacagc ggctgctgtt ggagaccagt tgggagctgg tggagagcgc cggcatcgat 
ccgcactccc tgcgtggcac cccgaccggc gtcttcctcg gcgtggcgcg gctcggctac 
ggcgagaacg gcaccgaagc cggtgacgcc gagggctatt cggtgaccgg ggtggcaccc 
gctgtcgcct ccgggcggat ctcctacgcc ctcgggctgg agggtccgtc gatcagcgtg 
gacaccgcgt gctcgtcgtc gttggtggcg ctgcacctgg cggtcgagtc gctgcggctg 
ggcgagtcga gtctcgctgt cgtcggcggg gcggcggtca tggcgacacc aggggtgttc 
gtcgacttca gccgccagcg ggcgttggcc gctgacggca ggtcgaaggc cttcggggcc 
gccgccgacg ggttcggctt ctccgagggg gtctccctcg tcctgctcga acggctctcc . 
gaggccgaaa gcaacggcca cgaggtgttg gctgtcatcc gtggctccgc cctcaaccag 
gacggggcca gcaacggtct cgccgcgccg aacgggaccg cccagcgcaa ggtgatccgg 
caggcgctac gaaactgcgg cctgaccccg gccgacgtgg acgccgtgga ggcgcacggc 
accggcacca cgctcggcga cccgatcgag gccaacgccc tgctggacac ctacggccgt 
gaccgggatc cggaccaccc gctgtggctg gggtcggtga agtcgaacat cggccacacg 
caggcggcgg cgggcgtcac cgggctgctc aagatggtgc tggcactgcg ccacgaggaa 
ctgcccgcca ccctgcacgt cgacgagccc accccgcacg tggactggtc ctcgggagcg 
gtacgcctgg cgacccgggg ccggccgtgg cggcggggtg accggccgag gcgggccggg 
gtgtcggcgt tcggcatcag cgggaccaac gcccacgtga tcgtcgagga ggcacccgag 
cggaccaccg agcgcaccgt cggcggcgac gtcggcccgg tcccgctcgt ggtgtccgcc 
cggtcggcgg cggcgctacg ggcccaggcg gcccaggtcg ccgagctggt ggagggctcc 
gacgtcgggc tggcggaggt cgggcggagc ctggccgtga cccgggcgcg acacgagcac 
cgggcggcgg tggtggcgtc gacccgggcc gaggcggtgc gggggctgcg cgaggtcgcg 
gcggtcgaac cgcgcggcga ggacaccgtc accggggtcg ccgagacgtc cgggcgcacc 
gtcgtcttcc tcttcccggg acaggggtcc cagtgggtcg ggatgggcgc ggagctgctg 
gactcggcac cggcgttcgc cgacacgatc cgcgcctgcg acgaggcgat ggcaccgttg 
caggactggt cggtctccga cgtgctccgg caggagccgg gggcaccggg actggaccgg 
gtcgacgtgg tgcagccggt gctgttcgcg gtgatggtgt cgttggcgcg gttgtggcag 
tcgtacgggg tcacccccgc tgcggtggtg gggcactcgc agggggagat cgccgccgcc 
cacgtggcgg gtgcgctctc cctcgccgac gcggcgaggc .tggtggtggg ccgcagccgg 
ttgctgcggt cgctgtccgg gggcggcggc atgagcgccg tcgcgctcgg tgaggccgag 
gtacgccgcc gactgcggtc gtgggaggac cggatctccg tggccgccgt caacggaccc 
cggtcggtgg tggtggccgg ggaaccggag gcgctgcggg agtggggacg ggagcgggag 
gccgagggcg tacgggtccg cgagatcgac gtcgactacg cctcgcactc gccgcagatc 
gacagggtcc gtgacgaact cctgacggtc acgggggaga tcgagccccg gtcggcggag 
atcaccttct actcgacggt cgacgtccgt gctgtcgacg gcaccgacct ggacgcgggg 
tactggtacc gcaacctgcg ggagacggtc cggttcgccg acgcgatgac ccggttggcc 
gactcgggat acgacgcgtt cgtcgaggtc agcccgcatc cggtggtggt gtcggcggtc 
gccgaggcgg tcgaggaggc aggtgtcgag gacgccgtcg tcgtcggcac * cctgtcccgg 
ggcgacggcg gaccgggggc gttcctgcgg tcggcggcca ccgcccactg cgccggtgtg 
gacgtcgact ggacgcccgc cctcccggga gctgcgacga tcccgttgcc gacgtacccg 
ttccaacgga agccgtactg gctgcggtcg tctgctcccg cccccgcctc ccacgatctc 
gcctaccggg tgtcctggac gccgatcacc ccgcccgggg acggcgtact cgacggcgac 
tggctggtgg tgcaccccgg gggcagcacc ggatgggtcg acgggttggc ggcggcgatc 
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accgccggcg gtggccgggt cgtcgcccac ccggtggact ccgtgacctc ccggaccggc 25800 

ctggccgagg cgctcgcccg gcgggacggc acgttccggg gggtgctgtc gtgggtggcg 25860 

accgacgaac ggcacgtcga ggccggtgcg gtcgccctgc tgaccctggc gcaggcgttg 25920 

ggtgacgccg gaatcgacgc accactgtgg tgcctgaccc aggaggcggt ccgtaccccc 25980 

gtcgacggtg acctggcccg accggcgcag gccgccctgc acggtttcgc ccaggtcgcc 26040 

cggctggagc tggcccgccg cttcggtggg gtgctcgacc tgcccgccac cgtcgacgcc 26100 

gccgggacgc gtctggtcgc ggcggtcctc gccggcggcg gcgaggacgt cgtcgccgtc 26160 

cgtggcgacc gtctctacgg ccgtcgcctg gtcagggcga ccctgccgcc gcccggcggg 26220 

gggttcaccc cgcacggcac cgtcctggtc accggcgcgg ccggtccggt gggcggtcgg - 26280 

ctggcccggt ggctcgccga acggggtgcc acccgactcg tcctgcccgg cgcacacccg 2634 0 

ggcgaggagt tgctgaccgc gatccgggcc gccggtgcca ccgccgtggt gtgcgaaccg 26400 

gaggcggagg cactgcgtac ggcgatcggc ggggagttgc cgaccgcgct cgtacacgcc 26460 

gagacgttga cgaacttcgc cggcgtcgcc gacgccgacc ccgaggactt cgccgccacc 26520 

gtcgcggcga agaccgcgct gccgacggtc ctggcggagg . tgctcggcga ccaccgcctc 26580 

gaacgggagg tctactgctc gtcggtggcc ggggtctggg gtggggtcgg catggccgcg 26640 

tacgccgccg gcagcgccta cctcgacgcc ctggtcgagc accgtcgcgc ccgggggcac 26700 

gccagcgcct cggtggcctg gaccccgtgg gccctgcccg gcgcggtcga cgacggtcgg 26760 

ctgcgcgagc gcggcctgcg cagcctcgac gtggccgacg ccctcgggac gtgggaacgt 26820 

ctgctccgcg ccggtgcggt gtcggtggcc gtcgccgacg tcgactggtc ggtcttcaca 26880 

gagggtttcg cggccatccg gccgaccccg ctcttcgacg aactcctcga ccggcgcggg 26940 

gaccccgacg gcgcgcccgt cgaccggccg ggggagccgg cgggcgagtg gggtcgacga 27000 

atcgcggcgc tgtccccgca ggaacagcgg gagacgttgc tgaccctcgt cggcgagacg 27060 

gtcgcggagg tgctgggaca cgagaccggc accgagatca acacccgtcg ggccttcagc 27120 

gaactcggcc tcgactcgct gggctcgatg gccctgcgtc agcgcctggc ggcccgtacc 27180 

ggcctgcgga tgccggcctc gctggtcttc gaccacccga cggtcaccgc gctcgcgcgg 2724 0 

tacctgcgtc gactggtcgt cggggactcc gacccgaccc cggtacgggt gttcggcccc 27300 

accgacgagg ccgaacccgt cgccgtggtc ggcatcggct gccggttccc cggcggcatc 27360 

gccacccccg aggacctctg gcgggtggtg tccgagggca cctccatcac caccggattc 27420 

cccaccgacc ggggctggga cctccggcgg ctctaccacc. ccgacccgga ccaccccggc 27480 

accagctacg tcgacagggg gggattcctc gacggggccc cggacttcga ccccgggttc 2754 0 

ttcgggatca ccccccgcga ggcgctggcg atggacccgc agcagcggct caccctggag 27 600 

atcgcgtggg "aggcggtgga acgggcgggc atcgacccgg agaccctcct cggcagcgac 27 660 

accggcgtct tcgtcggcat gaacggccag tcctacctgc aactgctgac cggggagggt 27720 

gaccggctca acggctacca ggggttgggc aactcggcga gcgtgctctc cggccgtgtc 27780 

gcctacacct tcgggtggga ggggccggcg ctgacggtgg acaccgcctg ctcgtcctcg 2784 0 

ctggtcgcca tccacctcgc catgcagtcg ctgcgtcggg gtgagtgctc gctggcgttg 27 900 

gccggcgggg tgacggtcat ggccgacccg tacaccttcg tggacttcag cgcacagcgg 27 960 

gggctcgccg ccgacgggcg gtgcaaggcg ttctccgcgc aggccgacgg gttcgccctc 28020 

gccgagggcg tcgcggcgct cgtcctcgaa ccgttgtcca aggcgcggcg aaacggccac 28080 

caggtgctgg cggtgctgcg cggcagcgcc gtcaaccagg acggggccag caacggcctc 2814 0 

gccgccccga acgggccgtc gcaggaacgg gtgatcaggc aggccctgac cgcctccggg 28200 

ctgcgtcccg ccgacgtcga catggtggag gcgcacggga cgggcacega actcggcgac 282 60 

ccgatcgagg ccggggcgct catcgcggcg tacggccggg accgggaccg gccgctctgg 28320 

ctgggctcgg tgaagacgaa catcggccac acccaggccg ccgccggtgc cgccggggtg 28380 

atcaaggcgg tcctggcgat gcggcacggc gtactcccga ggtcgctgca cgccgacgag 28440 

ttgtccccgc acatcgactg ggcggacggg aaggtcgagg tgctccgcga ggcacgacag 28500 

tggccccccg gtgagcgccc ccgccgcgcc ggggtgtcct ccttcggcgt cagcgggacc 28560 

aacgcccacg tcatcgtcga ggaggcaccc gccgaaccgg accccgaacc ggttcccgcc 28620 

gccccgggcg ggcccctgcc cttcgtcctg cacggacgca gcgtccagac ggtccggtcc 28680 

caggcgcgga ccctcgccga acacctgcgc accaccggcc accgggacct cgccgacacc 28740 

gcccgtaccc tggccaccgg tcgcgcccgt ttcgacgtcc gggccgcagt gctcggcacc 28800 

gaccgggagg gtgtctgcgc' cgccctcgac gcgctggcgc " aggatcgccc ctcgcccgac 28860 

gtcgtcgccc cggcggtctt cgccgcccgt acccccgtcc tggtcttccc cgggcagggg 28920 

tcgcagtggg tcggcatggc ccgtgacctg ctcgactcct ccgaggtgtt cgccgagtcg 28980 

atgggccggt gcgccgaggc gctgtcgccg tacaccgact gggacctgct cgacgtggtc 2904 0 

cgtggggtcg gcgaccccga cccgtacgac cgggtggacg tgctccagcc ggtgctgttc 29100 

gcggtgatgg tgtcgctggc gcggttgtgg cagtcgtacg gggtgactcc* gggtgcggtg 29160 

gtgggtcact cgcaggggga gatcgccgcc gcgcacgtgg ctggtgcgtt gtcgttggcc 29220 

gacgccgcca gggtggtggc . gttgcgcagc cgggtgctgc gggagctcga cgaccagggc 29280 

ggcatggtgt cggtcggcac ctcccgcgcc gagttggact cggtcctgcg ccggtgggac 2-934 0 

gggcgggtcg cggtggcggc ggtgaacgga cccggcacgc tcgtggtggc cggacccacc 29*400 
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gccgaactgg acgagttcct cgcggtggcc gaggcccgcg 
gcggtgcgct acgcgtcgca ctccccggag gtggcccggg 
gaactcggca ccgtcaccgc cgtcggcggc acggtcccgc 
gacctcctcg acaccacagc catggacgcc gggtactggt 
gtgctgttcg agcacgccgt ccgcagcctc ctggagcggg 
gtcagcccgc accctgtgct gctgatggcg gtcgaggaga 
ccggtcaccg gcgtgccgac gctgcgccgc gaccacgacg 
aacctcctgg gggcgcacgt gcacggggtc gacgtcgacc 
ggccgcctgg tcgacctgcc cacctacccc ttcgacaggc 
caccgcaggg ccgacacctc gtcgctgggg gtccgtgact 
gccgcagtcg acgtacccgg tcacggcgga gcggtgttca 
gagcagcagt ggctgaccca gcacgtggtg ggtgggcgga 
ctggtcgacc tcgcgctcac cgccggggcc gacgtcggcg 
gtcctgcagc agccgctggt gttgaccgcc gccggtgcgt 
gccgccgacg aggacgggcg gcggccggtc gagatccacg 
ccggccgagg cccggtggtc ggcgtacgcg accgggaccc 
ggcggccggg acggcacaca gtggcccccg cccggcgcca 
cactacgaca ccctcgccga actgggctac gagtacgggc 
gccgcgtggc agcacggcga cgtggtctac gcggaggtgt 
gggtacgcgt tcgacccggt gctgctcgac gccgtcgccc 
cgcgcccccg ggaagctccc cttcgcctgg cggggcgtca 
actgcggtac gggtggtggc gacccccgcc ggaccggacg 
gacccgaccg gtcagctcgt cgccacggtg gacgccctgg 
gatcgggacc agccgcgcgg ccgcgacggc gacctgcacc 
gccaccccgg acccgacccc ggcggcggtg gtgcacgtgg 
ctgctgcgcg ccggtggtcc ggcaccacag gccgtcgtcg 
gacgacccga cggccgaggc ccgtcacggg gtgctctggg 
tggctcgacg acgaccggtg gcccgccacc accctggtgg 
gaggtctccc ccggggacga cgtgccgcgc cccggggccg 
cgctgcgccc aggcggagtc cccggaccgc ttcgtgctcg 
cccccggcgg tgccggacaa tccgcagctc gcggtccgtg 
cggctgacgc "cgctcgccgg tcccgtgccg gccgtcgccg 
cccggcaacg gcggctccat cgaggcagtg gccttcgccc 
. cccctggcgc cggaggaggt acgcgtcgcc gtccgcgcca 
gtcctgctcg cgctcggcat gtacccggaa ccggccgaga 
gtggtcaccg aggtcgggtc gggtgtccgg cggttcaccc 
ctgttccagg gggccttcgg gccggtggcg gtcgccgacc 
cccgacgggt ggcgggcggt ggacgccgca gccgtaccca 
tacgcgctgc acgacctggc cgggttgcag gccgggcagt 
gccggcgggg tggggatggc tgccgtcgcg ttggcccgtc 
gccacggcca gcccggccaa acacccgacg ctgcgggcgc 
atcgcctcgt cccgggagag cgggttcggt gagcggttcg 
ggcgtcgacg tggtcctgaa ctcgctcacc ggcgacctgc 
ctcgccgacg gcggggtctt cgtcgagatg ggcaagaccg 
. ttccggggcc ggtacgtccc gttcgacctg gccgaggccg 
atcctggagg aggtcgtcgg tctgctggcc gccggtgccc 
gtgtgggagt tgtcggcggc cccggccgcg ctcacccaca 
ggcaagctcg tcctcaccca gcccgccccc gtgcaccccg 
ggcgggaccg gcaccctggg gcggctggtc gcccgccacc 
ccccacctcc tggtggccag - ccggcgcggt ccggcggccc 
gccgacgtcg aaggcctcgg cgcgaccatc gagatcgtcg 
gaggcgctcg cggcgctgct cgactcgatc cccgcggacc 
cacaccgccg gggtcctggc cgacgggctg gtcacctcca 
caggtcctgc gggccaaggt cgacgcggcg tggcacctgc 
gacctgagct tcttcgtgct gttctcgtcg gcggcgtcgg 
ggcgtgtacg cggcggccaa cggggtcctc aacgccctgg 
ggactgcccg cgaaggcgct cgggtggggc ctgtgggcgc 
ggcctcggtg accggatcgc ccgtaccggg gtcgccgcgc 
gccctgttcg acgcggctct gcgcagcggc ggggaggtgc 
aggtcggcgc tgcgccgggc cgagtacgtc cccgaggtgc 
acgccacggg ccgccaacag ggccgagacc ccgggccggg 



agatgaggcc 

tcgaacagcg 

tctactccac 

accgcaacct 

gattcgagac 

ccgccgagga 

ggccgtcgga 

tgcgtccggc . 

agcggctctg 

cgacccaccc 

ccgggcggct 

acctggtgcc 

tgccggtgct 

tgctgcgcct 

ccgccgagga 

tcgccgtcgg 

ccgccctgac 

cggcgttcca 

ccctcgacgc 

agaccttcgg 

ccctgcacgc 

cggtggccct 

tcgtcaggga 

gcctggagtg 

cggccgacgg 

tccgctaccg 

cggccacgct 

tggccacgtc 

ccgccgtgtg 

tcgacggcga 

acggtgcggt 

accgggcgta 

ccgtccccga 

ccggcgtgaa 

tgggcaccga 

ccggccaggc 

accggctcct 

tcgcgttcac 

ccgtgctggt 

gggccggggc 

tcggcctcga 

ccgcgcgtac 

tcgacgagtc 

acctgcggcc 

gtcccgatcg 

tcgaccggtt 

tgagccgggg 

acggaacggt 

tggtgaccgg 

cgggcgcggc 

cctgcgacac 

gtccgctgac 

tcgacgggac 

acgacctgac 

tgctggccgg 

ccgggcaacg 

aggccagcga 

tgccgaccga 

tgttcccgct 

tgcgcggcgc 

gcctgctcga 



gcgtcggatc 

gctcgccgcc 

cgccaccggg 

gcgccaaccg 

gttcatcgag 

cgccgagcgc 

gttcctccgc 

ggtcgcccac 

gcccaagccg 

gctgctgcac 

ctcccccgac 

cggcagtgtc 

ggaggaactc 

gtcggtcggc 

cgtctccgac 

cgtggccggc 

gttgaccgac 

ggcgctgcgc 

cgtcgaggag 

cctgaccagt 

caccggggcc 

gcgggtcacc 

cgccggggcg 

ggtacggctg 

gctcgacgac 

tcccgacggc 

cgtgcgccgt 

cgcaggggtc 

gggggtgctg 

cccggagacg 

gttcgtgcca 

ccggctggtg 

cgccgaccgg 

cttccgtgac 

ggcgtccggt 

ggtgacgggc 

caccccggtc 

caccgcccac 

ccacgccgcc 

ggaggtgttc 

cgacgaccac 

cggggggcgg 

cgcgcggctg 

ggcggagcag 

gctcggcgag 

gccggtgtcg 

ccgacacgtg 

gctggtcacc 

gcacggcgta 

cgagctgcgc 

cgccgaccgg 

cggggtggtg 

cgccaccgat 

ccgggacgcg 

tcccgggcag 

gcgggccctc 

gatgaccagc 

gcgggcgctg 

gtctgtcgac 

ggtccggtcc 

ccgtctcgtc 
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ggtgcacccg 
gcggtcgccg 
gggttcgact 
cggctgccca 
cggtcggagt 
ctggaacggg 
ctggaggcgc 
atcagtgacg 
ggaggggacg 
acaggtccac 
atccgatgag 
ccgtcgccga 
aaccgatcgc 
cgttctggga 
gctggccgcc 
acgccgcctt 
tgatgctgga 
gcggcagcgc 
acgaggcacc 
ccggacgggt 
gctcctccgg 
ccctggtcct 
gcagccaggg 
gcttcgggct 
ccgagggccg 
gcaacgggct 
agcgggcgcg 
ggctgggcga 
ccggccgccc 
cgggggtggc 

cgttgcactt 
tgtccgagac 
tcggcatcag 
ccgacctcga 
ccaccgccga 
ccctgcgcgc 
tgcgcgacac 
tcgtcggcgg 
tcgacggagc 
ggcagggcgc 
cggagtccat 
aggtgctcga 
cggtgatggt 
tgggtcactc 
acgccgccag 
ggatggcgtc 
gtgcgctgac 
gcccgttgga 
ccgtcgacta 
cactggccgg 
aggtcatcga 
tgcgcttcca 
tcagcccgca 
ccgacgcgga 
tccacaccgc 
tgggtgaggg 
tcccggtccc 
ggcaccccgt 
cggcagtacc 
ccgtcgtgtt 
acggcaccgc 



agaccgatca 
gctacgactc 
cgctggcggc 
gcacgctggt 
tgttcgccga 
cgctcgacgc 
tgctgcgccg 
acgccagtga 
tctaggtgac - 
cgggttcgcg 
cgagagcagc 
actcgactcg 
cgtcgtcggc 
gttcatccgc 
ggcaccgcga 
cttcggcatc 
gatctcctgg 
cggtggcgtc 
cgaggaggtg 
ggcgtacacc 
gctcaccgcg 
cgccggtggg 
cgggttggcc 
cgccgagggg 
gccggtgctg 
caccgcgccg 
gctgcgtccc 
tccgatcgag 
gctctgggtc 
cggggtgatg 
cgacgagccc 
'ccggccctgg 
cggcaccaac 
cccgaccccc 
gccgggtgcg 
ccaggcggcc 
cgccttcacc 
gggcgaggag 
cgtcagcggg 
acagtggcag 
cgacgcctgc 
cggcgagcag 
gtcgttggcg 
gcagggggag 
ggtggtggcg 
gttcgggctc 
tgtcgcctcg 
cgagctgatc 
cgcctcacac 
ggtccgtccg 
aacggcgacg 
ggacgccacc 
cccggtgttg 
tccgtgtgtc 
gctcgccgag 
acgcccggtc 
cctgggccgg 
cgacctcggg 
cccggcctgg 
gtgcaccgcg 
cctgtccact 



ggtggccgcg 
ggccgaccag 
ggtggagctg 
gttcgaccac 
ctccgcgccg 
cctgcccgac 
gtggcagagc 
cgacgagctg 
aggtcgattc 
tcgcctccca 
ggcatgaccg 
gtgacaggtc 
atggcctgcc 
gacggtggtg 
ccccgcctcg 
tcaccccgcg 
gaggcgttgg 
ttcaccggtg 
ctcggctacg 
ctggggttgg 
gtgcacctgg 
gtcaccgtga 
gaggacggcc 
gccggggtcc 
gccgtactgc 
agcggccccg 
gtcgacgtgg 
gcgcacgccc 
ggatcggtga 
aagaccgtgc 
tcgccgcacg 
ccggtggggg 
gcgcacgtca 
ggcccggcaa 
gaggcggtcg 
cggctcgccg 
ctggtcaccc 
gtcctcgccg 
cgggcgcgcg 
ggcatggccc 
gagcgggcgc 
tcgttggacc 
cggttgtggc 
atcgccgccg 
ttgcgcagcc 
caccccgacc 
gtcaacggtc 
gccgagtgcg 
tccccgcagg 
gtgtcggccg 
atggacgccg 
aggcagctcg 
acagtcggtg 
acaggcaccc 
gcgtacaccc 
gacctgccgg 
gtccccgaca 
cggtcctccc 
acggacgtgg 
cagtcgcgcg 
gtggtctctc 



ctggccgagc 
ctgcccgaac 
cgcaaccggc 
ccgacaccgc 
gacgtcgggg 
gcgcagggac 
cgacgacccc 
ttctcgatgc 
cgccccgcgg 
cacccgacgg 
aggaccgcct 
ggctcgacga 
ggttccccgg 
acgcgatcgc 
gtggtctcct 
aggcgctcgc 
agcgtgcggg 
tcggtgcggt 
tcggcatcgg 
agggtccagc 
cgatggagtc 
tgagcagccc 
gctgcaaacc 
tggtgctcca 
gtggctcggc 
cccagcggcg 
actacgtgga 
tgctcgacac 
agtccaacat 
tggcgctgcg 
tcgactggga 
agcgcccgcg 
tcgtcgagga 
ccggagcgac 
cactggtgtt 
accgtctcac 
gccgtgccac 
gcctccgggc 
ccggccgccg 
gggacctgct 
tcgccccgca 
ccgtcgacgt 
agtcgtacgg 
cgcacgtggc 
gggtgctgcg 
aggccgccga 
cccgttcggt 
aggccgaggg 
tggagtcgct 
ggatccccct 
actactggtt 
ccgaggcggg 
tcgaggccac 
tgcgccgcga 
ggggggtgga 
tctacccgtt 
ccggcgacga 
tggccggacg 
tccgcgacgg 
cccggatcgg 
tgctcgcgct 



tggtccgctc 
gcaaggcgtt 
tcggcgtcac 
tggcggtggc 
tcggtgcgcg 
acgccgacgt 
cggagaccga 
tcgacaggcg 
cagtggaccg 
ccggggtatc 
ccggcgctat 
ggtcgagtac 
gggtgtggac 
cgaggcgccc 
cgcggagccg 
gacggacccc 
tttcgacccg 
ggactacgga 
caccgcctcc 
cgtcaccgtc 
gctgcgccgc 
gggtgcgttc 
gttctcccgc 
acggctgtcc 
gatcaaccag 
ggtgatcagg 
ggcccacggc 
gtacggtgcc 
cggtcacacc 
gcatcgggag 
ccggggtgcg 
ccgggcgggg 
ggcgccgagc 
ccccggaacg 
ctccgcgcgc 
cgacgacccg 
ctgggagcat 
cgtcgccggg 
ggtggtgctg 
gcggcagtcg 

cgtggactgg 

ggtgcagccg 
ggtgactccg 
tggtgcgttg 
ccgtctcggt 
gcggatcgcg 
ggtgctggcc 
cgtgaccgcc 
gcgtgaggag 
gtactcgacc 
cgccaacctc 
gttcgacgcc 
cctcgaggca 
acgcggcggt 
ggtcgactgg 
ccaacgacag 
gtggcgttac 
ggtcctggtg 
cctggaacag 
cgccgcactc 
cgccgagggc 



gcacgcggcg 
caaggacctc 
caccggcgta 
cgaacacctg 
cctcgacgac 
cggggcccgc 
gccagtgacg 
tctcggcggg 
taccgccctg 
cacggaaggg 
ctcaagcgca 
cgggcccgcg 
tcgccggagg 
acggaccgtg 
ggcgcgttcg 
cagcagcgcc 
tcgagcctgc 
cccaggccgg 
agcgtcgcct 
gacaccgcct 
gacgagtgca 
accgagttcc 
gccgccgacg 
gtcgcccggg 
gacggtgcca 
caggcgttgg 
accggcaccc 
gaccgggaac 
caggcggcgg 
atcccggcga 
gtgtcggtgg 
gtgtcctcgt 
ccgcaggcgg 
gatgccgccc 
gacgagcggg 
gccccctcgt 
cgggcggtcg 
ggacgtcccg 
gtcttccccg . 
ccgaccttcg 
tcgctgcgcg 
gtgctgttcg 
ggtgcggtgg 
tcgttggccg 
ggtcacggcg 
cgcttcgcgg 
ggggagaacg 
cgtcggatcc 
ctgctcgccg 
ctgaccggtc 
cgggagccgg 
ttcgtcgagg 
gtgctgcccc 
ctcgcgcagt 
cgtaccgcag 
. aacttctggc 
cagctcgcct 
gtgaccggag 
cgcggggcga 
gacgccgtcg 
ggtgctgtcg 
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acgaccccag cctggacacc ctcgcgttgg tccaggcgct cggcgcagcc gggatcgacg 
tccccctgtg gctggtgacc agggacgccg ccgccgtgac cgtcggagac gacgtcgatc 
cggcccaggc catggtcggt gggctcggcc gggtggtggg cgtggagtcc cccgcccggt 
ggggtggcct ggtggacctg cgcgaggccg acgccgactc ggcccggtcg ctggccgcca 
tactggccga cccgcgcggc gaggagcagt tcgcgatccg gcccgacggc gtcaccgtcg 
cccgtctcgt cccggcaccg gcccgcgcgg cgggtacccg gtggacgccg cgcgggaccg 
tcctggtcac cggcggcacc ggcggcatcg gcgcgcacct ggcccgctgg ctcgccggtg 
cgggcgccga gcacctggtg ctgctcaaca ggcggggagc ggaggcggcc ggtgccgccg 
acctgcgtga cgaactggtc gcgctcggca cgggagtcac' catcacggcc tgcgacgtcg 
ccgaccgcga ccggttggcg gccgtcctcg acgccgcacg ggcgcaggga' cgggtggtca 
cggcggtgtt ccacgccgcc gggatctccc ggtccacagc. ggtacaggag ctgaccgaga 
gcgagttcac cgagatcacc gacgcgaagg tgcggggtac ggcgaacctg gccgaactct 
gtcccgagct ggacgccctc gtgctgttct cctcgaacgc ggcggtgtgg ggcagcccgg 
ggctggcctc ctacgcggcg ggcaacgcct tcctcgacgc cttcgcccgt cgtggtcggc 
gcagtgggct gccggtcacc tcgatcgcct ggggtctgtg ggccgggcag aacatggccg 
gtaccgaggg cggcgactac ctgcgcagcc agggcctgcg cgccatggac ccgcagcggg 
cgatcgagga gctgcggacc accctggacg ccggggaccc gtgggtgtcg gtggtggacc 
tggaccggga gcggttcgtc gaactgttca ccgccgcccg ccgccggccc ctcttcgacg 
aactcggtgg ggtccgcgcc ggggccgagg agaccggtca ggaatcggat ctcgcccggc 
ggctggcgtc gatgccggag gccgaacgtc acgagcatgt cgcccggctg gtccgagccg 
aggtggcagc ggtgctgggc cacggcacgc cgacggtgat cgagcgtgac gtcgccttcc 
gtgacctggg attcgactcc atgaccgccg tcgacctgcg gaaccggctc gcggcggtga 
ccggggtccg ggtggccacg accatcgtct tcgaccaccc gacagtggac cgcctcaccg 
cgcactacct ggaacgactc gtcggtgagc cggaggcgac gaccccggct gcggcggtcg 
tcccgcaggc acccggggag gccgacgagc cgatcgcgat cgtcgggatg gcctgccgcc 
tcgccggtgg agtgcgtacc cccgaccagt tgtgggactt catcgtcgcc gacggcgacg 
cggtcaccga gatgccgtcg gaccggtcct gggacctcga cgcgctgttc gacccggacc 
ccgagcggca cggcaccagc tactcccggc acggcgcgtt cctggacggg gcggccgact 
tcgacgcggc gttcttcggg atctcgccgc gtgaggcgtt ggcgatggat ccgcagcagc 
ggcaggtcct ggagacgacg tgggagctgt tcgagaacgc cggcatcgac ccgcactccc 
tgcgcggtac ggacaccggt gtcttcctcg gcgctgcgta ccaggggtac ggccagaacg 
cgcaggtgcc gaaggagagt gagggttacc tgctcaccgg tggttcctcg gcggtcgcct 
ccggtcggat cgcgtacgtg ttggggttgg aggggccggc gatcactgtg gacacggcgt 
gttcgtcgtc gcttgtggcg ttgcacgtgg cggccgggtc gctgcgatcg ggtgactgtg 
ggctcgcggt ggcgggtggg gtgtcggtga tggccggtcc ggaggtgttc accgagttct 
ccaggcaggg cgcgctggcc cccgacggtc ggtgcaagcc cttctccgac caggccgacg 
ggttcggatt cgccgagggc gtcgctgtgg tgctcctgca gcggttgtcg gtggcggtgc 
gggaggggcg tcgggtgttg ggtgtggtgg tgggttcggc ggtgaatcag gatggggcga 
gtaatgggtt ggcggcgccg tcgggggtgg cgcagcagcg ggtgattcgg cgggcgtggg 
gtcgtg'cggg tgtgtcgggt ggggatgtgg gtgtggtgga ggcgcatggg acggggacgc 
ggttggggga tccggtggag ttgggggcgt tgttggggac gtatggggtg ggtcggggtg 
gggtgggtcc ggtggtggtg ggttcggtga aggcgaatgt gggtcatgtg caggcggcgg 
cgggtgtggt gggtgtgatc aaggtggtgt tggggttggg tcgggggttg gtgggtccga 
tggtgtgtcg gggtgggttg tcggggttgg tggattggtc gtcgggtggg ttggtggtgg 
cggatggggt gcgggggtgg ccggtgggtg tggatggggt gcgtcggggt ggggtgtcgg 
cgtttggggt gtcggggacg aatgctcatg tggtggtggc ggaggcgccg gggtcggtgg 
tgggggcgga acggccggtg gaggggtcgt cgcgggggtt ggtgggggtg gctggtggtg 
tggtgccggt ggtgctgtcg gcaaagaccg aaaccgccct gaccgagctc gcccgacgac 
tgcacgacgc cgtcgacgac accgtcgccc tcccggcggt ggccgccacc ctcgccaccg 
gacgcgccca cctgccctac cgggccgccc tgctggcccg cgaccacgac gaactgcgcg 
acaggctgcg ggcgttcacc actggttcgg cggctcccgg tgtggtgtcg ggggtggcgt 
cgggtggtgg "tgtggtgttt gtttttcctg gtcagggtgg tcagtgggtg gggatggcgc 
gggggttgtt gtcggttccg gtgtttgtgg agtcggtggt ggagtgtgat gcggtggtgt 
cgtcggtggt ggggttttcg gtgttggggg tgttggaggg tcggtcgggt gcgccgtcgt 
tggatcgggt ggatgtggtg cagccggtgt tgttcgtggt gatggtgtcg ttggcgcggt 
tgtggcggtg gtgtggggtt gtgcctgcgg cggtggtggg tcattcgcag ggggagatcg 
cggcggcggt ggtggcgggg gtgttgtcgg tgggtgatgg tgcgcgggtg gtggcgttgc 
gggcgcgggc gttgcgggcg ttggccggcc acggcggcat ggtctccctc gcggtctccg 
ccgaacgcgc ccgggagctg atcgcaccct ggtccgaccg gatctcggtg gcggcggtca 
actccccgac ctcggtggtg gtctcgggtg acccacaggc cctcgccgcc ctcgtcgccc 
actgcgccga gaccggtgag cgggccaaga cgctgcctgt ggactacgcc tcccactccg 
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cccacgtcga acagatccgc gacacgatcc tcaccgacct ggccgacgtc acggcgcgcc 4 04 40 

gacccgacgt cgccctctac tccacgctgc acggcgcccg gggcgccggc acggacatgg 4 0500 

acgcccggta ctggtacgac aacctgcgct caccggtgcg cttcgacgag gccgtcgagg 40560 

ccgccgtcgc cgacggctac cgggtcttcg tcgagatgag cccacacccg gtcctcaccg 4 0620 

ccgcggtgca ggagatcgac gacgagacgg tggccatcgg ctcgctgcac cgggacaccg 4 0680 

gcgagcggca cctggtcgcc gaactcgccc gggcccacgt gcacggcgta .ccagtggact 4 0740 

ggcgggcgat cctccccgcc acccacccgg ttcccctgcc gaactacccg ttcgaggcga 40800 

cccggtactg gctcgccccg acggcggccg accaggtcgc cgaccaccgc taccgcgtcg 4 0860 

actggcggcc cctggccacc accccggcgg agctgtccgg cagctacctc gtcttcggcg 4 0920 

acgccccgga gaccctcggc cacagcgtcg agaaggccgg cgggctcctc gtcccggtgg 40980 

ccgctcccga ccgggagtcc ctcgcggtcg ccctggacga ggcggccgga cgactcgccg 41040 

gtgtgctctc cttcgccgcc gacaccgcca cccacctggc ccggcaccga ctcctcggcg 41100 

aggccgacgt cgaggcccca ctctggctgg tcaccagcgg cggcgtcgca ctcgacgacc 41160 

acgacccgat cgactgcgac caggcaatgg tgtgggggat cggacgggtg atgggtctgg 41220 

agaccccgca ccggtggggc ggcctggtgg acgtgaccgt cgaacccacc gccgaggacg 41280 

gggtggtctt cgccgccctc ctggccgccg acgaccacga ggaccaggtg gcgctgcgcg 4134 0 

acggcatccg ccacggccga cggctcgtcc gcgccccgct gaccacccga aacgccaggt 41400 

ggacaccggc gggcacggcg ctcgtcacgg gcggtacggg tgccctcggc ggccacgtcg 41460 

cgcggtacct ggcccggtcc ggggtgaccg atctcgtcct gctcagcagg agcggccccg 41520 

acgcacccgg tgccgccgaa ctggccgccg aactggccga cctcggggcc gagccgagag 41580 

tcgaggcgtg cgacgtcacc gacgggccac gcctgcgcgc cctggtgcag gagctacggg 4164 0 

aacaggaccg gccggtccgg atcgtcgtcc acaccgcagg ggtgcccgac tcccgtcccc 41700 

tcgaccggat cgacgaactg gagtcggtca gcgccgcgaa ggtgaccggg gcgcggctgc 417 60 

tcgacgagct ctgcccggac gccgacacct tcgtcctgtt ctcctcgggg gcgggagtgt 41820 

ggggtagcgc gaacctgggc gcgtacgcgg cagccaacgc ctacctggac gccctggccc 41880 

accgccgccg ccaggcgggc cgggccgcga cctcggtcgc ctggggggcg tgggccggcg 41940 

acggcatggc caccggcgac ctcgacgggc tgacccggcg cggtctgcgg gcgatggcac 42000 

cggaccgggc gctgcgcgcc tgcaccaggc gttggaccac ccacgacacc tgtgtgtcgg 42060 

tagccgacgt cgactgggac cgcttcgccg tgggtttcac cgccgcccgg cccagacccc 42120 

tgatcgacga actcgtcacc tccgcgccgg tggccgcccc caccgctgcg gcggccccgg 42180 

tcccggcgat gaccgccgac cagctactcc agttcacgcg ctcgcacgtg gccgcgatcc 42240 

tcggtcacca "ggacccggac gcggtcgggt tggaccagcc cttcaccgag ctgggcttcg 4 2300 

actcgctcac cgccgtcggc ctgcgcaacc agctccagca ggccaccggg cggacgctgc 4 2360 

ccgccgccct ggtgttccag caccccacgg tacgcagact cgccgaccac ctcgcgcagc 4 2420 

agctcgacgt cggcaccgcc ccggtcgagg cgacgggcag cgtcctgcgg gacggctacc 42480 

ggcgggccgg gcagaccggc gacgtccggt cgtacctgga cctgctggcg aacctgtcgg 4 254 0 

aqttccggga gcggttcacc gacgcggcga gcctgggcgg acagctggaa ctcgtcgacc 4 2600 

tgqccgacgg atccggcccg gtcactgtga tctgttgcgc gggcactgcg gcgctctccg 4 2660 

. ggccgcacga gttcgcccga ctcgcctcgg cgctgcgcgg caccgtgccg gtgcgcgccc 42720 

tcgcgcaacc cgggtacgag gcgggtgaac cggtgccggc gtcgatggag gcagtgctcg 4 27 80 

gggtgcaggc ggacgcggtc ctcgcggcac agggcgacac gccgttcgtg ctggtcggac 42840 

actcggcggg ggccctgatg gcgtacgccc tggcgaccga gctggccgac cggggccacc 4 2900 

cgccacgtgg cgtcgtgctc ctcgacgtgt acccacccgg tcaccaggag gcggtgcacg 42960 

cctggctcgg cgagctgacc gccgccctgt tcgaccacga gaccgtacgg atggacgaca 4 3020 

cccggctcac ggccctgggg gcgtacgaca ggctgaccgg caggtggcgt ccgagggaca 4 3080 

ccggtctgcc cacgctggtg gtggccgcca gcgagccgat gggggagtgg ccggacgacg 4 314 0 

gttggcagtc cacgtggccg ttcgggcacg acagggtcac ggtgcccggt gaccacttct 4 3200 

cgutggtgca ggagcacgcc gacgcgatcg cgcggcacat cgacgcctgg ttgagcgggg 4 32 60 

agagggcatg aacacgaccg atcgcgccgt gctgggccga cgactccaga tgatccgggg 43320 

actgtactgg ggttacggca gcaacggaga cccgtacccg atgctgttgt gcgggcacga 4 3380 

cgacgacccg caccgctggt accgggggct gggcggatcc ggggtccggc gcagccgtac 4344 0 

cgagacgtgg gtggtgaccg accacgccac cgccgtgcgg gtgctcgacg' acccgacctt 43500 

cacccgggcc accggccgga cgccggagtg gatgcgggcc gcgggcgccc cggcctcgac 4 3560 

ctgggcgcag ccgttccgtg acgtgcacgc cgcgtcctgg gacgccgaac tgcccgaccc 43620 

gcaggaggtg gaggaccggc tgacgggtct cctgcctgcc ccggggaccc gcctggacct 4 3680 

ggtccgcgac ctcgcctggc cgatggcgtc gcggggggtc ggcgcggacg accccgacgt 43740 

gctgcgcgcc gcgtgggacg cccgggtcgg cctcgacgcc cagctcaccc cgcagcccct 43800 

ggcggtgacc gaggcggcga tcgccgcggt gcccggggac ccgcaccggc gggcgctgtt 4 38 60 

caccgccgtc gagatgacag ccaccgcgtt cgtcgacgcg gtgctggcgg tgaccgccac 4 3920 

ggcgggggcg gcccagcgtc tcgccgacga ccccgacgtc gccgcccgtc tcgtcgcgga 49980 

ggtgctgcgc ctgcatccga cggcgcacct ggaacggcgt accgccggca ccgagacggt 4 404 0 
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ggtgggcgag cacacggtcg cggcgggcga cgaggtcgtc gtggtggtcg ccgccgccaa 
ccgtgacgcg ggggtcttcg ccgacccgga ccgcctcgac ccggaccggg ccgacgccga 
ccgggccctg tccgcccagc gcggtcaccc cggccggttg gaggagctgg tggtggtcct 
gaccaccgcc gcactgcgca gcgtcgccaa ggcgctgccc ggtctcaccg ccggtggccc 
ggtcgtcagg cgacgtcgtt caccggtcct gcgagccacc gcccactgcc cggtcgaact 
ctgaggtgcc tgcgatgcgc gtcgtcttct cctccatggc cagcaagagc cacctgttcg 
gtctcgttcc cctcgcctgg gccttccgcg cggcgggcca cgaggtacgg gtcgtcgcct 
caccggctct caccgacgac atcacggcgg ccggactgac ggccgtaccg gtcggcaccg 
acgtcgacct tgtcgacttc atgacccacg ccgggtacga catcatcgac tacgtccgca 
gcctggactt cagcgagcgg gacccggcca cctccacctg ggaccacctg ctcggcatgc 
agaccgtcct caccccgacc ttctacgccc tgatgagccc ggactcgctg gtcgagggca 
tgatctcctt ctgtcggtcg tggcgacccg actggtcgtc tggaccgcag accttcgccg 
cgtcgatcgc ggcgacggtg accggcgtgg cccacgcccg actcctgtgg ggacccgaca 
tcacggtacg ggcccggcag aagttcctcg ggctgctgcc cggacagccc gccgcccacc 
gggaggaccc cctcgccgag tggctcacct ggtctgtgga gaggttcggc ggccgggtgc 
cgcaggacgt cgaggagctg gtggtcgggc agtggacgat cgaccccgcc ccggtcggga 
tgcgcctcga caccgggctg aggacggtgg gcatgcgcta cgtcgactac aacggcccgt 
cggtggtgcc ggactggctg cacgacgagc cgacccgccg acgggtctgc ctcaccctgg 
gcatctccag ccgggagaac agcatcgggc aggtctccgt cgacgacctg ttgggtgcgc 
tcggtgacgt cgacgccgag atcatcgcga cagtggacga gcagcagctc gaaggcgtcg 
cccacgtccc ggccaacatc cgtacggtcg ggttcgtccc gatgcacgca ctgctgccga 
cctgcgcggc gacggtgcac cacggcggtc ccggcagctg gcacaccgcc gccatccacg 
gcgtgccgca ggtgatcctg cccgacggct gggacaccgg ggtccgcgcc cagcggaccg 
aggaccaggg ggcgggcatc gccctgccgg tgcccgagct gacctccgac cagctccgcg 
aggcggtgcg gcgggtcctg gacgatcccg ccttcaccgc cggtgcggcg cggatgcggg 
ccgacatgct cgccgagccg tcccccgccg aggtcgtcga cgtctgtgcg gggctggtcg 
gggaacggac cgccgtcgga tgagcaccga cgccacccac gtccggctcg gccggtgcgc 
cctgctgacc agccggctct ggctgggtac ggcagccctc gccggccagg acgacgccga 
cgcagtacgc ctgctcgacc acgcccgttc ccggggcgtc aactgcctcg acaccgccga 
cgacgactct gcgtcgacca gtgcccaggt cgccgaggag tcggtcggcc ggtggttggc 
cggggacacc ggtcggcggg aggagaccgt cctgtcggtg acggtgggtg tcccaccggg 
cgggcaggtc ggcgggggcg gcctctccgc ccggcagatc atcgcctcct gtgagggctc 
cctgcggcgt ctcggtgtcg accacgtcga cgtccttcac ctgccccggg tggaccgggt 
ggagccgtgg gacgaggtct ggcaggcggt ggacgccctc gtggccgccg gaaaggtctg 
ttacgtcggg tcgtcgggct tccccggatg gcacatcgtc gccgcccagg agcacgccgt 
ccgccgtcac cgcctcggcc tggtgtccca ccagtgtcgg tacgacctga cgtcgcgcca 
tcccgaactg gaggtcctgc ccgccgcgca ggcgtacggg ctcggggtct tcgccaggcc 
gacccgcctc ggcggtctgc tcggcggcga cggtccgggc gccgcagccg cacgggcgtc 
gggacagccg acggcactgc gctcggcggt ggaggcgtac gaggtgttct gcagagacct 
cggcgagcac cccgccgagg tcgcactggc gtgggtgctg tcccggcccg gtgtggcggg 
ggcggtcgtc ggtgcgcgga cgcccggacg gctcgactcc gcgctccgcg cctgcggcgt 
cgccctcggc gcgacggaac tcaccgccct ggacgggatc ttccccgggg tcgccgcagc 
aggggcggcc ccggaggcgt ggctacggtg agagcccgcc cctgacctgc gggaacccgt 
gtcggtgcgg cgggacggcc gccgcggtcc ccgccccggt cagccggtgg gggtgagccg 
cagcaggtcc ggcgccaccg actcggccac ctccccgacg tggtcggcga ggtagaagtg 
cccgcccggg aaggtccggg tacggccggg gactaccgag tacggcagcc agcgttgggc 
gtcctccacc gtcgtcaacg ggtcggtgtc accgcagagg gtggtgatgc cggcccgcag 
cggcggcccg gcctgccagg cgtaggagcg cagcacccgg tggtcggccc gcagcaccgg 
cagcgacatg tccaacagcc cctggtcggc caatgcggcc tcgctgaccc cgagcctgcg 
catctgctcg acgagtccgt cctcgtcggg caggtcggtg cgccgctcgt ggacccgggg 
ggcggtctgc ccggagacga acaaccgcag cggtcgcacc cccggacgag cctccaggcg 
acgggcggtc tcgtaggcga ccagggcgcc catgctgtga ccgaacaggg cgaacggaac 
ctcgccgacg aggtcgcgca gcacggccgc gacctcgtcg gcgatctccc cggcggtgcc 
gagagcccgc tcgtcacgtc ggtcctgccg gcccgggtac tgcaccgccc acacgtcgac 
ctccggggcc agtgcccggg cgaggtcgag gtacgagtcg gcggcggctc ccgcgtgcgg 
gaagcagtac agccgggccc ggtgtccgtc ggcggacccg aaccgccgca accaggtgtt 
catcggtgtc tcatccgttc ggtcgcaccg gcaggtggtc gatgccgcgc agcaggagcg 
accgccgcca gacaacctcg tcggagggga agcccagcga cagcttcggg aagcggtcga 
acagggcccc cagggcgacc tctccctcca gcttggccag cgggcggccc atgcagtagt 
ggatgccgtg cccgaaggtg aggtgtcccc ggctgtccct ggtgacgtcg aaccggtcgg 
ggtcggggaa ctgtcccggg tcgcggttgg ccgccccgtt ggcgatcagg acggtgctgt 
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acgccgggat cgtcaccccg ccgatctcca cctcggcggt ggcgaaccgg gtggtggtct 4 7760 

ccggtggggc ctggtagcgc aggatctcct ccaccgctcc gggcagcagt gccgggtcct .47820 

tccggaccag cgcgagctgg tcggggtggg tcagcagcag gtaggtgccg atcccgatga 4 7880 

ggctcaccga cgcctcgaat cccgccagca gcagcaccag cgcgatggag gtgagttcgt 4 7 940 

cgcggctgag ccggtcggcg tcgtcgtcct ggacccggat c 47981 

<210> 2 
<211> 48 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 2 

Met Gly Asp Arg Val Asn Gly His Ala Thr Pro Glu Ser Thr Gin Ser 

1. " 5 10 15 

Ala lie Arg Phe Leu Thr Arg His Gly Gly Pro Pro Thr Ala Thr Asp 

20 25 30 

Asp Val His Asp Trp Leu Ala His Arg Ala Ala Glu His Arg Leu Glu 
35 40 45 

<210> 3 
<211> 377 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 3 

Met Ala Val Gly. Asp Arg Arg Arg Leu Gly Arg Glu Leu Gin Met Ala 

1 5 10 15 

Arg Gly Leu Tyr Trp Gly Phe Gly Ala Asn Gly Asp Leu Tyr Ser Met 

20 25 30 

Leu Leu Ser Gly Arg Asp Asp Asp Pro Trp Thr Trp Tyr Glu Arg Leu 

35 40 45 

Arg Ala Ala' Gly Arg Gly Pro Tyr Ala Ser Arg Ala Gly Thr Trp Val 

50 55 60 

Val Gly Asp His Arg Thr Ala Ala Glu Val Leu Ala Asp Pro Gly Phe 
65 70 75 80 

Thr His Gly Pro Pro Asp Ala Ala Arg Trp Met Gin Val Ala His Cys 

85 90 95 

Pro Ala Ala Ser Trp Ala Gly Pro Phe Arg Glu Phe Tyr Ala Arg Thr 

100 105 110 

Glu Asp Ala Ala Ser Val Thr Val Asp Ala Asp Trp Leu Gin Gin Arg 

115 120 125 

Cys Ala Arg Leu Val Thr Glu Leu Gly Ser Arg Phe Asp Leu Val Asn 

130 135 140 

Asp Phe Ala Arg Glu Val Pro Val Leu Ala Leu Gly Thr Ala Pro Ala 
145 150 155 160 

Leu Lys Gly Val Asp Pro Asp Arg Leu Arg Ser Trp Thr Ser Ala Thr 

. 165 170 175 

Arg Val Cys Leu Asp Ala Gin Val Ser Pro Gin Gin Leu Ala Val Thr 

180 185 190 

Glu Gin Ala Leu Thr Ala Leu Asp Glu lie Asp Ala Val Thr Gly Gly 

195 200 205 

Arg Asp Ala Ala Val Leu Val Gly Val Val Ala Giu Leu Ala Ala Asn 

210 215 220 

Thr Val Gly Asn Ala Val Leu Ala Val Thr Glu Leu Pro Glu Leu Ala 
225 230 235 240 

Ala Arg Leu Ala Asp Asp Pro Glu Thr Ala Thr Arg Val Val Thr Glu 

245 250 255 

Val Ser Arg Thr Ser Pro Gly Val His Leu Glu Arg Arg Thr Ala Ala 

260 265 270 

Ser Asp- Arg Arg Val Gly Gly Val Asp Val Pro Thr Gly Gly Glu Val 
275 280 285 
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<210> 4 































<211> 436 
<212> PRT 

<213> Micronionospora megalomicea 
<400> 4 

Met Arg Val Val Phe Ser Ser Met Ala Val Asn Ser His Leu Phe Gly 

1 5 10 * 15 

Leu Val Pro Leu Ala Ser Ala Phe Gin Ala Ala Gly His Glu Val Arg 

20 25 30 

Val Val Ala Ser Pro Ala Leu Thr Asp Asp Val Thr Gly Ala Gly Leu 

35 40 45 

Thr Ala Val Pro Val Gly Asp Asp Val Glu Leu Val Glu Trp His Ala 

50 55 60 

His Ala Gly . Gin Asp lie Val Glu Tyr Met Arg Thr Leu Asp Trp Val 
65 70 75 80 

Asp Gin Ser His Thr Thr Met Ser Trp Asp Asp Leu Leu Gly Met Gin 

85 90 95 

Thr Thr Phe* Thr Pro Thr Phe Phe Ala Leu Met Ser Pro Asp Ser Leu 

100 105 110 

He Asp Gly Met Val Glu Phe Cys Arg Ser Trp Arg Pro Asp Trp He 

115 120 125 

Val Trp Glu Pro Leu Thr Phe Ala Ala Pro He Ala Ala Arg Val Thr 

130 135 140 

Gly Thr Pro His Ala Arg Met Leu Trp Gly. Pro Asp Val Ala Thr Arg 
145 150 155 160 

Ala Arg Gin Ser Phe Leu Arg Leu Leu Ala His Gin Glu Val Glu His 

165 170 175 

Arg Glu Asp Pro Leu Ala Glu Trp Phe Asp Trp Thr Leu Arg Arg Phe 

180 185 190 

Gly Asp Asp Pro His Leu Ser Phe Asp Glu Glu Leu Val Leu Gly Gin 

195 200 205 

Trp Thr Val Asp Pro He Pro Glu Pro Leu Arg He Asp Thr Gly Val 

210 215 220 

Arg Thr Val Gly Met Arg Tyr Val Pro Tyr Asn Gly Pro Ser Val Val 
225 230 235 240 

Pro Ala Trp Leu Leu Arg Glu Pro Glu Arg Arg Arg Val Cys Leu Thr 

245 250 255 

Leu Gly Gly Ser Ser Arg Glu His Gly He Gly Gin Val Ser He Gly 

260 265 270 

Glu Met Leu Asp Ala He Ala Asp He Asp Ala Glu Phe Val Ala Thr 

275 280 285 

Phe Asp Asp Gin Gin Leu Val Gly Val Gly Ser Val Pro Ala Asn Val 

290 295 300 

Arg Thr Ala Gly Phe Val Pro Met Asn Val Leu Leu Pro Thr Cys Ala 
305 310 315 320 

Ala Thr-Val His His Gly Gly Thr Gly Ser Trp Leu Thr Ala Ala He 

325 330 335 
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<211> 390 
<212> PRT 

<213> Micrornonospora megalomicea 
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Tyr Gly Ala Asp Leu Leu Gly Phe Ser Gin Thr Glu Asp Ala Pro Leu 

325 330 " . 335 

Gly Leu Ala Leu Phe Met lie lie Pro Phe Leu Ala* Val Ser Leu Val 

340 345 350 

Leu Ser Trp Leu Leu Tyr Arg Phe Val Glu Leu Pro Val Met Arg Asn 

355 360 365 

Trp Ala Arg Pro Ala Ser Ala Arg Arg Lys Pro Ala Thr Glu Pro Glu 

370 375 380 

Gin Thr Pro Ser Arg Arg 
385 390 

<210> 6 
<211> 374 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 6 

Met Thr Thr Tyr Val Trp Ser Tyr Leu Leu Glu Tyr Glu Arg Glu Arg 

1 5 10 15 

Ala Asp lie Leu Asp Ala Val Gin Lys Val Phe Ala Ser.. Gly Ser Leu 

20 25 30 

He Leu Gly Gin Ser Val Glu Asn Phe Glu Thr Glu Tyr Ala Arg Tyr 

35 4 0 4 5 

His Gly He Ala His Cys Val Gly Val Asp Asn Gly Thr Asn Ala Val 

50 55 60 

Lys Leu Ala Leu Glu Ser Val Gly Val Gly Arg Asp Asp Glu Val Val 
65 70 75 80 

Thr Val Ser Asn Thr Ala Ala Pro Thr Val Leu Ala He Asp Glu He 

85 90 95 

Gly Ala Arg Pro Val Phe Val Asp Val Arg Asp Glu Asp Tyr Leu Met 

100 105 HO 

Asp Thr Asp* Leu Val Glu Ala Ala Val Thr Pro Arg Thr Lys Ala He 

115 120 125 

Val Pro Val His Leu Tyr Gly Gin Cys Val Asp Met Thr Ala Leu Arg 

130 135 140 

Glu Leu Ala Asp Arg Arg Gly Leu Lys Leu Val Glu Asp Cys Ala Gin 
145 150 155 160 

Ala His Gly Ala Arg Arg Asp Gly Arg Leu Ala Gly Thr Met Ser Asp 

165 170 175 

Ala Ala Ala Phe Ser Phe Tyr Pro Thr Lys Val Leu Gly Ala Tyr Gly 

180 185 190 

Asp Gly Gly Ala Val Val Thr Asn Asp Asp Glu Thr Ala Arg Ala Leu 

195 200 205 

Arg Arg Leu Arg Tyr Tyr Gly Met Glu Glu Val Tyr Tyr Val Thr Arg 

210 215 220 

Thr Pro Gly His Asn Ser Arg Leu Asp Glu Val Gin Ala Glu He Leu 
225 230 235 240 

Arg Arg Lys Leu Thr Arg Leu Asp Ala Tyr Val Ala Gly Arg Arg Ala 

245 250 255 

Val Ala Gin Arg Tyr Val Asp Gly Leu Ala Asp Leu Gin Asp Ser His 

260 265 270 

Gly Leu Glu Leu Pro Val Vai Thr Asp Gly Asn Glu His Val Phe Tyr 

275 280 285 

Val Tyr Val Val Arg His Pro Arg Arg Asp Glu He He Lys Arg Leu 

290 295 300 

Arg Asp Gly Tyr Asp He Ser Leu Asn He Ser Tyr Pro Trp Pro Val 
305 310 315 320 

His Thr Met Thr Gly Phe Ala His Leu Gly Val Ala Ser Gly Ser Leu 

325 330 335 

Pro Val- Thr Glu Arg Leu Ala Gly Glu He Phe Ser Leu Pro Met Tyr 

340 345 350 
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Pro Ser Leu Pro His Asp Leu Gin Asp Arg Val lie Glu Ala Val Arg 

355 360 365 " 

Glu Val lie Thr Gly Leu 
370 . 

<210> 7 
<211> 257 
<212> PRT 
» <213> Micromonospora megalomicea 

<400> 7 



Met 


Pro 


Asn 


Ser 


His 


Ser 


Thr 


Thr 


Ser 


Ser 


Thr 


Asp 


Val 


Ala 


Pro 


Tyr 


1 








5 










10 










15 


Glu 


Arg 


Ala 


Asp 


He 


Tyr 


His 


Asp 


Phe 


Tyr 


His 


Gly Arg 


Gly 


Lys 


Gly 








20 










25 










30 






Tyr 


Arg 


Ala 
35 


Glu 


Ala 


Asp 


Ala 


Leu 
40 


Val 


Glu 


Val 


Ala 


Arg 
45 


Lys 


His 


Thr 


Pro 


Gin 
50 


Ala 


Ala 


Thr 


Leu 


Leu 
55 


ASD 


Val 


Ala 


Cvs 

« jr *^ 


Glv 
60 


Thr 


Gly 


Ser 


His 


Leu 


Val 


Glu 


Leu 


Ala 


Asp 


Ser 


Phe 


Arg 


Glu 


Val 


Val 


Gly.. Val 


Asp 


Leu 


65 










70 










75 










80 


Ser 


Ala 


Ala 


Met 


Leu 


Ala 


Thr 


Ala 


Ala 


Arg 


Asn 


Asp 


Pro 


Gly Arg 


Glu 










85 










90 










95 




Leu 


His 


Gin 


Gly 
100 


Asp 


Met. 


Arg 


Asp 


Phe 
105 


Ser 


Leu 


Asp 


Arg 


Arg 
110 


Phe 


Asp 


Val 


Val 


Thr 
115 


Cys 


Met 


Phe 


Ser 


Ser 
120 


Thr 


Gly 


Tyr 


Leu 


Val 
125 


Asp 


Glu 


Ala 


Glu 


Leu 
130 


Asp 


Arg 


Ala 


Val 


Ala 
135 


Asn 


Leu 


Ala 


Gly 


His 
140 


Leu 


Ala 


Pro 


Gly 


Gly 


Thr 


Leu 


Val 


Val 


Glu 


Pro 


Trp 


Trp 


Phe 


Pro 


Glu 


Thr 


Phe 


Arg 


Pro 


145 










150 










155 










160 


Gly 


Trp 


Val 


Gly 


Ala 
165 


Asp 


Leu 


Val 


Thr 


Ser 
170 


Gly 


Asp 


Arg 


Arg 


He 
175 


Ser 


Arg 


Met 


Ser 


His 
180 


Thr 


Val 


Pro 


Ala 


Gly 
185 


Leu 


Pro 


Asp 


Arg 


Thr 
190 


Ala 


Ser 


Arg 


Met 


Thr 
195 


lie 


His 


Tyr 


Thr 


Val 
200 


Gly 


Ser 


Pro 


Glu 


Ala 
205 


Gly 


He 


Glu 


His 


Phe 


Thr 


Glu 


Val 


His 


Val 


Met 


Thr 


Leu 


Phe 


Ala 


Arg 


Ala 


Ala 


Tyr 




210 










215 










220 








Glu 


Gin 


Ala 


Phe 


Gin 


Arg 


Ala 


Gly 


Leu 


Ser 


Cys 


Ser 


Tyr 


Val 


Gly 


His 


225 










230 










235 








240 


Asp 


Leu 


Phe 


Ser 


Pro 
245 


Gly 


Leu 


Phe 


Val 


Gly 
250 


Val 


Ala 


Ala 


Glu 


Pro 
255 


Gly 



Arg 



<210> 8 
<211> 201 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 8 

Met Arg Val Glu Glu Leu Gly lie Glu Gly Val Phe Thr Phe Thr Pro 

15 10 15 

Gin Thr Phe Ala Asp Glu Arg Gly Val Phe Gly Thr Ala Tyr Gin Glu 

20 25 30 

Asp Val Phe Val Ala Ala Leu Gly Arg Pro Leu Phe Pro Val Ala Gin 

35 40 45 

Val Ser Thr Thr Arg Ser Arg Arg Gly Val Val Arg Gly Val His Phe 

50 . 55 60 

Thr Thr Met Pro Gly Ser Met Ala Lys Tyr Val Tyr Cys Ala Arg Gly 
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65 70 75 80 

Arg Ala Met Asp Phe Ala Val Asp lie Arg Pro Gly Ser Pro Thr Phe 

85 90 95 

Gly Arg Ala Glu Pro Val Glu Leu Ser Ala Glu Ser Met Val Gly Leu 

100 105 110 

Tyr Leu Pro Val Gly Met Gly His Leu Phe Val Ser Leu Glu Asp Asp 

115 120 125 

Thr Thr Leu Val Tyr Leu Met Ser Ala Gly Tyr Val Pro Asp Lys Glu 

130 135 140' 

Arg Ala Val His Pro Leu Asp Pro Glu Leii Ala Leu Pro lie Pro Ala 
145 150 . 155 160 

Asp Leu Asp Leu Val Met Ser Glu Arg Asp Arg Val Ala Pro Thr Leu 

165 170 175 

Arg Glu Ala Arg Asp Gin Gly lie Leu Pro Asp Tyr Ala Ala Cys Arg 

. 180 185 190 

Ala Ala Ala His Arg Val Val Arg Thr 
195 200 

<210> 9 
<211> 328 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 9 

Met Val Val Leu Gly Ala Ser Gly Phe Leu Gly Ser Ala Val Thr His 

15 10 15 

Ala Leu Ala Asp Leu Pro Val Arg Val Arg Leu Val Ala Arg Arg Glu 

20 25 30 

Val Val Val Pro Ser Gly Ala Val Ala Asp Tyr Glu Thr His Arg Val 

35 40 45 

Asp Leu Thr Glu Pro Gly Ala Leu Ala Glu Val Val Ala Asp Ala Arg 

50 55 60 

Ala Val Phe Pro Phe Ala Ala Gin lie Arg Gly Thr Ser Gly Trp Arg 
6S 70 75 80 

He Ser Glu Asp Asp Val Val Ala Glu Arg Thr Asn Val Gly Leu Val 

85 90 95 . 

Arg Asp Leu lie Ala Val Leu Ser Arg Ser Pro His Ala Pro Val Val 

100 105 HO 

Val Phe Pro Gly Ser Asn Thr Gin Val Gly Arg Val Thr Ala Gly Arg 

115 120 125 

Val He Asp Gly Ser Glu Gin Asp His Pro Glu Gly Val Tyr Asp Arg 

130 135 140 

Gin Lys His Thr Gly Glu Gin Leu Leu Lys Glu Ala Thr Ala Ala Gly 
145 150 155 160 

Ala He Arg Ala Thr Ser Leu Arg Leu Pro Pro Val Phe Gly Val Pro 

165 170 - 175 

Ala Ala Gly Thr Ala Asp Asp Arg Gly Val Val Ser Thr Met lie Arg 

180 185 190 

Arg Ala Leu Thr Gly Gin Pro Leu Thr Met Trp His Asp Gly Thr Val 

195 200 205 

Arg Arg Glu Leu Leu Tyr Val Thr Asp Ala Ala Arg Ala Phe Val Thr 

210 215 220 

Ala Leu Asp His Ala Asp Ala Leu Ala Gly Arg His Phe Leu Leu Gly 
225 230 235 240 

Thr Gly Arg Ser Trp Pro Leu Gly Glu Val Phe Gin Ala Val Ser Arg 

245 250 ■ 255 

Ser Val Ala Arg His Thr Gly Glu Asp Pro Val Pro Val Val. Ser Val 

260 265 270 

Pro Pro Pro Ala His Met Asp Pro Ser Asp Leu Arg Ser Val Glu Val 

-275 280 285 

Asp Pro Ala Arg Phe Thr Ala Val Thr Gly Trp Arg Ala Thr Val Thr 
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290 295 300 

Met Ala Glu Ala Val Asp Arg Thr Val Ala Ala Leu Ala Pro Arg Arg 
305 310 315 320 

Ala Ala Ala Pro Ser Glu Pro Ser 

325 

<210> 10 
<211> 330 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 10 




























Met 


Gly 


Thr 


Thr 


Gly 


Ala 


Gly 


Ser 


Ala 


Arg 


Val 


Arg 


Val 


Gly Arg 


Ser 


1 








5 










10 














Ala 


Leu 


His 


Thr 


Ser 


Arg 


Leu 


Trp 


Leu 


Gly 


Thr 


Val 


Asn 


Phe 


Ser 


Gly 


















25 










30 






Val 


Thr 


Asp 


Asp Asp 


Ala 


Leu 


Arg 


Leu 


Met 


Asp 


His 


Ala 


Leu 


Glu 






35 










40 


















Ara 


Gly 
50 


Val 


Asn 


Cys 


He 


Asp 

55 


Thr 


Ala 


Asp 


He 


Tyr 
60 


Gly 


Tro 

rib ^k> ^mT 


A m 


Leu 


Tvr 


Lys 


Gly 


His 


Thr 


Glu 


Glu 

VJ x u 


Leu 


Val 


Gly Arg 


Trp 


Phe 


Ala 


Gin 




65 










70 










75 










p n 
o u 


Gly 

Jr 


Gly Arg 


Arg 


Glu 


Glu 


Thr 


Val 


Leu 


Ala 


Thr 


Lys 


Val 


Gly 

Jr 


Ser 


Glu 










85 










90 










95 




Met 


Ser 


Glu 


Arc 
100 


Val 


Asn 


Asp 


Gly 


Gly 
^ j. 

105 


Leu 

> 


Ser 


Ala 


Ara 


His 
110 


He 


v ax 


Ala 


Ala 


Cys 


Glu 


Asn 


Ser 


Leu 


Arg 


Ara 


Leu 


Gly Val 


Asp 


His 


He 








115 










120 










125 








lie 


Tyr 
130 


Gin 


Thr 


His 


His 


He 
135 


Asp 


Arg 


Ala 


Ala 


Pro 
140 


Trp 


Asp 


Glu 


Val 


Trp 


Gin 


Ala 


Ala 


Glu 


His 


Leu 


Val 


Gly 

Jr 


Ser 


Gly 


Lys 


Val 


Gly 


Tyr 


Val 


145 










150 










155 










160 


Gly 


Ser 


Ser 


Asn 


Leu 
165 


Ala 


Gly 

mi. 


Trp 


His 


He 
170 


Ala 


Ala 


Ala 


Gin 


Glu 
175 


Ser 


Ala 


Ala 


Arg 


Arg 
180 


Asn 


Leu 


Leu 


Gly 


Met 
185 


He 


Ser 


His 


Gin 


Cys 
190 


Leu 


Tyr 


Asn 


Leu 


Ala 
195 


Val 


Arg 


His 


Pro 


Glu 
200 


Leu 


Asp 


Val 


Leu 


Pro 
205 


Ala 


Ala 


Gin 


Ala 


Tyr 
210 


Gly 


Val 


Gly 


Val 


Phe 
215 


Ala 


Trp 


Ser 


Pro 


Leu 
220 


His 


Gly 


Gly 


Leu 


Leu 


Ser 


Gly 


Val 


Leu 


Glu 


Lys 


Leu 


Ala 


Ala 


Gly 


Thr 


Ala 


Val 


Lys 


Ser 


225 










230 










235 










240 


Ala 


Gin 


Gly 


Arg 


Ala 
245 


Gin 


Val 


Leu 


Leu 


Pro 
250 


Ala 


Val 


Arg 


Pro 


Leu 
255 


Val 


Glu 


Ala 


Tyr 


Glu 


Asp 


Tyr 


Cys 


Arg 


Arg 


Leu 


Gly Ala 


Asp 


Pro 


Ala 


Glu 








260 










265 










270 






Val 


Gly 


Leu 


Ala 


Trp 


Val 


Leu 


Ser 


Arg 


Pro 


Gly 


He 


Leu 


Gly Ala 


Val 






275 










280 










285 








lie 


Gly 
290 


Pro 


Arg 


Thr 


Pro 


Glu 
295 


Gin 


Leu 


Asp 


Ser 


Ala 
300 


Leu 


Arg 


Ala 


Ala 


Glu 


Leu 


Thr 


Leu 


Gly 


Glu 


Glu 


Glu 


Leu 


Arg 


Glu 


Leu 


Glu 


Ala 


He 


Phe 


305 










310 










315 










320 


Pro 


Ala 


Pro 


Ala 


Val 


Asp 


Gly 


Pro 


Val 


Pro 















325 330 



*210> 11 
<211> 417 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 11 
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Met Arg Val Leu Leu Thr Ser Phe Ala His Arg Thr His Phe Gin Gly 

1 5 10 IS 

Leu Val Pro Leu Ala Trp Ala Leu His Thr Ala Gly His Asp Val Arg 

20 25 30 

Val Ala Ser Gin Pro Glu Leu Thr Asp Val Val Val Gly Ala Gly Leu 

35 40 45 

Thr Ser Val Pro Leu Gly Ser Asp His Arg Leu Phe Asp lie Ser Pro 

50 55 60 

Glu Ala Ala Ala Gin Val His Arg Tyr Thr Thr Asp Leu Asp Phe Ala 
65 .70 75. * 80 

Arg Arg Gly Pro Glu Leu Arg Ser Trp Glu Phe Leu His Gly lie Glu 

85 90 95 

Glu Ala Thr Ser Arg Phe Val Phe Pro Val Val Asn Asn Asp Ser Phe 

100 105 110 

Val Asp Glu Leu Val Glu Phe Ala Met Asp Trp Arg Pro Asp Leu Val 

115 120 125 

Leu Trp Glu Pro Phe Thr Phe Ala Gly Ala Val Ala Ala Lys Ala Cys 

130 135 140 

Gly Ala Ala His Ala Arg Leu Leu Trp Gly Ser Asp Leu Thr Gly Tyr 
145 150 155 160 

Phe Arg Ser Arg Ser Gin Asp Leu Arg Gly Gin Arg Pro* Ala Asp Asp 

165 170 175 

Arg Pro Asp Pro Leu Gly Gly Trp Leu Thr Glu Val Ala Gly Arg Phe 

180 185 190 

Gly Leu Asp Tyr Ser Glu Asp Leu Ala Val Gly Gin Trp Ser Val Asp 

195 200 205 

Gin Leu Pro Glu Ser Phe Arg Leu Glu Thr Gly Leu Glu Ser Val His 

210 215 220. 

Thr Arg Thr Leu Pro Tyr Asn Gly Ser Ser Val Val Pro Gin Trp Leu 
225 230 235 240 

Arg Thr Ser Asp Gly Val Arg Arg Val Cys Phe Thr Gly Gly Tyr Ser 

245 250 255 

Ala Leu Gly lie Thr Ser Asn Pro Gin Glu Phe Leu Arg Thr Leu Ala 

260 265 270 

Thr Leu Ala Arg Phe Asp Gly Glu lie Val Val Thr Arg Ser Gly Leu 

275 280 285 

Asp Pro Ala Ser Val Pro Asp Asn Val Arg Leu Val Asp Phe Val Pro 

290 295 300 

Met Asn lie Leu Leu Pro Gly Cys Ala Ala Val lie His His Gly Gly 
305 310 315 320 

Ala Gly Ser Trp Ala Thr Ala Leu His His Gly Val Pro Gin lie Ser 

325 330 335 

Val Ala His Glu Trp Asp Cys Val Leu Arg Gly Gin Arg Thr Ala Glu 

340 345 350 

Leu Gly Ala Gly Val Phe Leu Arg Pro Asp Glu Val Asp Ala Asp Thr 

355 360 365 

Leu Trp Gin Ala Leu Ala Thr Val Val Glu Asp Arg Ser His Ala Glu 

370 375 380 

Asn Ala Glu Lys Leu Arg Gin Glu Ala Leu Ala Ala Pro Thr Pro Ala 
385 390 395 400 

Glu Val Val Pro Val Leu Glu Ala Leu Ala His Gin His Arg Ala Asp 

405 410 415 

Arg 



<210> 12 
<211> 313 
<212> PRT 

<213> Microiuonospora megaloraicea 
<400> 12 
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Met 


Thr 


Arg 


His 


Val 


Thr 


Leu 


Leu 


Gly 


Val 


Ser 


Gly 


Phe 


Val 


Gly 


Ser 


1 








5 










10 










15 




Ala 


Leu 


Leu 


Arg 


Glu 


Phe 


Thr 


Thr 


His 


Pro 


Leu 


Arg 


Leu 


Arg 


Ala 


Val 








20 








. 


25 










30 






Ala 


Arg 


Thr 


Gly 


Ser 


Arg 


Asp 


Gin 


Pro 


Pro 


Gly 


Ser 


Ala 


Gly 


He 


Glu 






35 










40 










45 








His 


Leu 


Arg 


Val 


Asp 


Leu 


Leu 


Glu 


Pro 


Gly Arg 


Val 


Ala 


Gin 


Val 


Val 




50 










55 










60 




• 






Ala 


Asp 


Thr 


Asp 


Val 


Val 


Val 


His 


Leu 


Val 


Ala 


Tyr 


Ala 


Ala 


Gly 


Gly 


65 










70 










75 






• 




80 


Ser 


Thr . 


Trp 


Arg 


Ser 


Ala 


Ala 


Thr 


Val 


Pro 


Glu 


Ala 


Glu 


Arg 


Val 


Asn 










85 










90 










95 




Ala 


Gly 


He 


Met 


Arg 


Asp 


Leu 


Val 


Ala 


Ala 


Leu 


Arg 


Ala 


Arg 


Pro 


Gly 








100 










105 










110 




Pro 


Ala 


Pro 


Val 


Leu 


Leu 


Phe 


Ala 


Ser 


Thr 


Thr 


Gin 


Ala 


Ala 


Asn 


Pro 






115 










120 










125 








Ala 


Ala 


Pro 


Ser 


Arg 


Tyr 


Ala 


Gin 


His 


Lys 


He 


Glu 


Ala 


Glu 


Arq 


He 




130 










135 










140 








Leu 


Arg 


Gin 


Ala 


Thr 


Glu 


Asp 


Gly 


Val 


Val 


Asp 


Gly 


Val 


He 


Leu 


Arq 


145 










150 










155 










160 


Leu 


Pro 


Ala 


He 


Tyr 


Gly 


His 


Ser 


Gly 


Pro 


Ser 


Gly 


Gin 


Thr 


Gly 


Arq 










165 










170 










175 




Gly 


Val 


Val 


Thr 


Ala 


Met 


He 


Arg 


Arg 


Ala 


Leu 


Ala 


Gly 


Glu 


Pro 


He 








180 










185 










190 






Thr 


Met 


Trp 


His 


Glu 


Gly 


Ser 


Val 


Arg 


Arg 


Asn 


Leu 


Leu 


His 


Val 


Glu 






195 










200 










205 








Asp 


Val 


Ala 


Thr 


Ala 


Phe 


Thr 


Ala 


Ala 


Leu 


His 


Asn 


His 


Glu 


Ala 


Leu 




210 










215 










220 










Val 


Gly 


Asp 


Val 


Trp 


Thr 


Pro 


Ser 


Ala 


Asp 


Glu 


Ala 


Arg 


Pro 


Leu 


Gly 


225 










230 










235 










240 


Glu 


lie 


Phe 


Glu 


Thr 


Val 


Ala 


Ala 


Ser 


Val 


Ala 


Arg 


Gin 


Thr 


Gly Asn 










245 










250 










255 




Pro 


Ala 


Val 


Pro 


Val 


Val 


Ser 


Val 


Pro 


Pro 


Pro 


Glu 


Asn 


Ala 


Glu 


Ala 








260 










265 






i 




270 






Asn 


Asp 


Phe 


Arg 


Ser 


Asp 


Asp 


Phe 


Asp 


Ser 


Thr 


Glu 


Phe 


Arg 


Thr 


Leu 






275 










280 










285 








Thr 


Gly 


Trp 


His 


Pro 


Arg 


Val 


Pro 


Leu 


Ala 


Glu 


Gly 


He 


Asp 


Arg 


Thr 




290 










295 










300 










Val 


Ala 


Ala 


Leu 


He 


Ser 


Thr 


Lys 


Glu 

















305 310 



<210> 13 
<211> 3546 
<212> PRT 

<213> Micromohospora megalomicea 



<400> 13 



Met 


Val 


Asp 


Val 


Pro 


Asp 


Leu 


Leu 


Gly 


Thr 


Arg 


Thr 


Pro 


His 


Pro 


Gly 


1 








5 










10 










15 


Pro 


Leu 


Pro 


Phe 
20 


Pro 


Trp 


Pro 


Leu 


Cys 
25 


Gly 


His 


Asn 


Glu 


Pro 
30 


Glu 


Leu 


Arg 


Ala 

# 


Arg 
35 


Ala 


Arg 


Gin 


Leu 


His 
40 


Ala 


Tyr 


Leu 


Glu 


Gly 
45 


He 


Ser 


Glu 


Asp 


Asp 


Val 


Val 


Ala 


Val 


Gly 


Ala 


Ala 


Leu 


Ala 


Arg 


Glu 


Thr 


Arg Ala 




50 










55 










60 










Gin 


Asp 


Gly 


Pro 


His 


Arg 


Ala 


Val 


Val 


Val 


Ala 


Ser 


Ser 


Val 


Thr 


Glu 


65 










70 










75 










80 


Leu 


Thr 


Ala 


Ala 


Leu 


Ala 


Ala 


Leu 


Ala 


Gin 


Gly Arg 


Pro 


His 


Pro 


Ser 










85 










90 










95 




Val 


Val- Arg 


Gly 


Val 


Ala 


Arg 


Pro 


Thr 


Ala 


Pro 


Val 


Val 


Phe 


Val 


Leu 



100 105 110 
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pro Gly Gin Gly Ala Gin Trp Pro Gly Met Ala Thr Arg Leu Leu Ala 

115 120 125 

Glu Ser Pro Val Phe Ala Ala Ala Met Arg Ala Cys Glu Arg Ala Phe 

130 135 140 

Asp Glu Val Thr Asp Trp Ser Leu Thr Glu.Val Leu Asp Ser Pro Glu 
145 150 155 160 

His Leu Arg Arg Val Glu Val Val Gin Pro Ala Leu Phe Ala Val Gin 

165 170 175 

Thr Ser Leu Ala Ala Leu Trp Arg Ser Phe Gly Val Arg Pro Asp Ala . 

180 185 190 

Val Leu Gly His Ser He Gly Glu Leu Ala Ala Ala Glu Val Cys Gly 

195 200 . 205 

Ala Val Asp Val Glu Ala Ala Ala Arg Ala Ala Ala Leu Trp Ser Arg 

210 215 220 

Glu Met Val Pro Leu Val Gly Arg Gly Asp Met Ala Ala Val Ala Leu 
225 230 235 240 

Ser Pro Ala . Glu Leu Ala Ala Arg Val Glu Arg Trp Asp Asp Asp Val 

245 250 255 

Val Pro Ala Gly Val Asn Gly Pro Arg Ser Val Leu Leu Thr Gly Ala 

260 265 270 

Pro Glu Pro He Ala Arg Arg Val Ala Glu Leu Ala Ala" Gin Gly Val 

275 280 285 

Arg Ala Gin Val Val Asn Val Ser Met Ala Ala His Ser Ala Gin Val 

290 295 300 

Asp Ala Val Ala Glu Gly Met Arg Ser Ala Leu Thr Trp Phe Ala Pro 
305 310 315 320 

Gly Asp Ser Asp Val Pro Tyr Tyr Ala Gly Leu Thr Gly Gly Arg Leu 

325 330 335 

Asp Thr Arg' Glu Leu Gly Ala Asp His Trp Pro Arg Ser Phe Arg Leu 

340 345 350 

Pro Val Arg Phe Asp Glu Ala Thr Arg Ala Val Leu Glu Leu Gin Pro 

355- 360 365 

Gly Thr Phe He Glu Ser Ser Pro His Pro Val Leu Ala Ala Ser Leu 

370 375 380 

Gin Gin Thr Leu Asp Glu Val Gly Ser Pro Ala Ala lie Val Pro Thr 
385 390 395 400 

Leu Gin Arg Asp Gin Gly Gly Leu Arg Arg Phe Leu Leu Ala Val Ala 

405 410 415 

Gin Ala Tyr Thr Gly Gly Val Thr Val Asp Trp Thr Ala Ala Tyr Pro 

420 425 430 

Gly Val Thr Pro Gly His Leu Pro Ser Ala Val Ala Val Glu Thr Asp 

435 440 445 

Glu Gly Pro Ser Thr Glu Phe Asp Trp Ala Ala Pro Asp His Val Leu 

450 455 460 

Arg Ala Arg Leu Leu Glu He Val Gly Ala Glu Thr Ala Ala Leu Ala 
465 470 475 480 

Gly Arg Glu Val Asp Ala Arg Ala Thr Phe Arg Glu Leu Gly Leu Asp 

485 490 495 

Ser Val Leu Ala Val Gin Leu Arg Thr Arg Leu Ala Thr Ala Thr Gly 

500 505 510 

Arg Asp Leu His He Ala Met Leu Tyr Asp His Pro Thr Pro His Ala 

515 520 525 

Leu Thr Glu Ala Leu Leu Arg Gly Pro Gin Glu Glu Pro Gly Arg Gly 

530 535 540 

Glu Glu Thr Ala His Pro Thr Glu Ala Glu Pro Asp Glu Pro Val Ala 
545 550 555 560 

Val Val Ala Met Ala Cys Arg Leu Pro Gly Gly Val Thr Ser Pro Glu 

565 570 575 

Glu Phe Trp Glu Leu Leu Ala Glu Gly Arg Asp Ala Val Gly Gly Leu 

580 585 590 

Pro Thr Asp Arg Gly Trp Asp Leu Asp Ser Leu Phe His Pro Asp Pro 
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595 600 605 

Thr Arg Ser Gly Thr Ala His Gin Arg Ala Gly Gly Phe Leu Thr Gly 

610 615 620 

Ala Thr Ser Phe Asp Ala Ala Phe Phe Gly Leu Ser Pro Arg Glu Ala 
625 630 635 640 

Leu Ala Val Glu Pro Gin Gin Arg lie Thr Leu Glu Leu Ser Trp Glu 

645 650 655 

Val Leu Glu Arg Ala Gly lie Pro Pro Thr Ser Leu Arg Thr Ser Arg 

660 665 670 

Thr Gly Val Phe Val Gly Leu lie Pro Gin Glu Tyr Gly Pro Arg Leu 

675 680 685 

Ala Glu Gly Gly Glu Gly Val Glu Gly Tyr Leu Met Thr Gly Thr Thr 

690 695 700 

Thr Ser Val Ala Ser Gly Arg Val Ala Tyr Thr Leu Gly Leu Glu Gly 
705 710 715 720 

Pro Ala lie Ser Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Val 

725 730 735 

His Leu Ala Cys Gin Ser Leu Arg Arg Gly Glu Ser Thr Met Ala Leu 

740 745 750 

Ala Gly Gly Val Thr Val Met Pro Thr Pro Gly Met Leu Val Asp Phe 

755 760 765 

Ser Arg Met Asn Ser Leu Ala Pro Asp Gly Arg Ser Lys Ala Phe Ser 

770 775 780 

Ala Ala Ala Asp Gly Phe Gly Met Ala Glu Gly Ala Gly Met Leu Leu 
785 790 795 800 

Leu Glu Arg Leu Ser Asp Ala Arg Arg His. Gly His Pro Val Leu Ala 

805 810 815 

Val lie Arg Gly Thr Ala Val Asn Ser Asp Gly Ala Ser Asn Gly Leu 

.820 825 830 

Ser Ala Pro Asn Gly Arg Ala Gin Val Arg Val Tie Arg Gin Ala Leu 

835 840 845 

Ala Glu Ser 'Gly Leu Thr Pro His Thr Val Asp Val Val Glu Thr His 

850 855 860 

Gly Thr Gly Thr Arg Leu Gly Asp Pro lie Glu Ala Arg Ala Leu Ser 
865 870 875 880 

Asp Ala Tyr Gly Gly Asp Arg Glu His Pro Leu Arg lie Gly Ser Val 

885 890 895 

Lys Ser Asn lie Gly His Thr Gin Ala Ala Ala Gly Val Ala Gly Leu 

900 905 910 

lie Lys Leu Val Leu Ala Met Gin Ala Gly Val Leu Pro Arg Thr Leu 

915 920 925 

His Ala Asp Glu Pro Ser Pro Glu lie Asp Trp Ser Ser Gly Ala lie 

930 935 940 

Ser Leu Leu Gin Glu Pro Ala Ala Trp Pro Ala Gly Glu Arg Pro Arg 
945 950 955 960 

Arg Ala Gly Val Ser Ser Phe Gly lie Ser Gly Thr Asn Aia His Ala 

965 970 975 

lie He Glu Glu Ala Pro Pro Thr Gly Asp Asp Thr Arg Pro Asp Arg 

980 985 990 

Met Gly Pro Val Val Pro Trp Val Leu Ser Ala Ser Thr Gly Glu Ala 

995 1000 1005 

Leu Arg Ala Arg Ala Ala Arg* Leu Ala Gly His Leu Arg Glu His Pro 

1010 1015 1020 

Asp Gin Asp Leu Asp Asp Val Ala Tyr Ser Leu Ala Thr Gl.y Arg Ala 
1025 1030 1035 1040 

Ala Leu Ala Tyr Arg Ser Gly Phe Val Pro Ala Asp Ala Ser Thr Ala 

1045 1050 1055 

Leu Arg He Leu Asp Glu Leu Ala Ala Gly Gly Ser Gly Asp Ala Val 

1060 1065 1070 

Thr Gly -Thr Ala Arg Ala Pro Gin Arg Val Val Phe Val Phe Pro Gly 
1075 1080 1085 
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Gin Gly Trp Gin Trp Ala Gly Met Ala Val Asp Leu Leu Asp Gly Asp 

1090 1095 1100 

Pro Val Phe Ala Ser Val Leu Arg Glu Cys Ala Asp Ala Leu Glu Pro 
1105 lHO 1115 1120 

Tyr Leu Asp Phe Glu lie Val Pro Phe Leu Arg Ala Glu Ala Gin Arg 

1125 H30 1135 

Arg Thr Pro Asp His Thr Leu Ser Thr Asp Arg Val Asp Val Val Gin 

1140 1145 1150 

Pro Val Leu Phe Ala Val Met Val Ser Leu Ala Ala Arg Trp Arg- Ala 

1155 1160 1165 • 

Tyr Gly Val Glu Pro Ala Ala Val He Gly His Ser Gin Gly Glu He 

H70 H75 1180 

Ala Ala Ala Cys Val Ala Gly Ala Leu Ser Leu Asp Asp Ala Ala Arg 
H85 1190 H95 1200 

Ala Val Ala Leu Arg Ser Arg Val He Ala Thr Met Pro Gly Asn Gly 

1205 1210 1215 

Ala Met Ala Ser He Ala Ala Ser Val Asp Glu Val Ala Ala Arg He 

1220 1225 1230 

Asp Gly Arg Val Glu lie Ala Ala Val Asn Gly Pro Arg Ala Val Val 

1235 1240 1245 

Val Ser Gly Asp Arg Asp Asp Leu Asp Arg Leu Val Ala* Ser Cys Thr 

1250 1255 1260 

Val Glu Gly Val Arg Ala Lys Arg Leu Pro Val Asp Tyr Ala Ser His 
1265 1270 1275 1280 

Ser Ser His Val Glu Ala Val Arg Asp Ala Leu His Ala Glu Leu Gly 

1285 1290 1295 

Glu Phe Arg Pro Leu Pro Gly Phe Val Pro Phe Tyr Ser Thr Val Thr 

1300 1305 1310 

Gly Arg Trp Val Glu Pro Ala Glu Leu Asp Ala Gly Tyr Trp Phe Arg 

1315 1320 1325 

Asn Leu Arg His Arg Val Arg Phe Ala Asp Ala Val Arg Ser Leu Ala 

1330 ' 1335 1340 

Asp Gin Gly Tyr Thr Thr Phe Leu Glu Val Ser Ala His Pro Val Leu 
1345 1350 1355 1360 

Thr Thr Ala He Glu Glu He Gly Glu Asp Arg Gly Gly Asp Leu Val 

1365 1370 1375 

Ala Val His Ser Leu Arg Arg. Gly Ala Gly Gly Pro Val Asp Phe Gly 

1380 1385 1390 

Ser Ala Leu Ala Arg Ala Phe Val Ala Gly Val Ala Val Asp Trp Glu 

1395 1400 1405 

Ser Ala Tyr Gin Gly Ala Gly Ala Arg Arg Val Pro Leu Pro Thr Tyr 

1410 1415 1420 

Pro Phe Gin Arg Glu Arg Phe Trp Leu Glu Pro Asn Pro Ala Arg Arg 
1425 1430 1435 1440 

Val Ala Asp Ser Asp Asp Val Ser Ser Leu Arg Tyr Arg He Glu Trp 

1445 1450 . 1455 

His Pro Thr Asp Pro Gly Glu Pro Gly Arg Leu Asp Gly Thr Trp Leu 

1460 1465 1470 

Leu Ala Thr Tyr Pro Gly Arg Ala Asp Asp Arg Val Glu Ala Ala Arg 

1475 1480 1485 

Gin Ala Leu Glu Ser Ala Gly Ala Arg Val Glu Asp Leu Val Val Glu 

1490 1495 1500 

Pro Arg Thr Gly Arg Val Asp Leu Val Arg Arg Leu Asp Ala Val Gly 
1505 1510 1515 1520 

Pro Val Ala Gly Val * Leu Cys Leu Phe Ala Val Ala Glu Pro Ala Ala 

1525 1530 1535 

Glu His Ser Pro Leu Ala Val Thr Ser Leu Ser Asp Thr Leu. Asp Leu 

1540 1545 1550 

Thr Gin Ala Val Ala Gly Ser Gly Arg Glu Cys Pro He Trp Val Val 

. 1555 1560 1565 

Thr Glu Asn Ala Val Ala Val Gly Pro Phe Glu Arg Leu Arg Asp Pro 
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1570 1575 1580 

Ala His Gly Ala Leu Trp Ala Leu Gly Arg Val Val Ala Leu Glu Asn 
1585 1590 1595 1600 

Pro Ala Val Trp Gly Gly Leu Val Asp Val Pro Ser Gly Ser Val Ala 

1605. 1610 1615 

Glu Leu Ser Arg His Leu Gly Thr Thr Leu Ser Gly Ala Gly Glu Asp 

1620 1625 1630 

Gin Val Ala Leu Arg Pro Asp Gly Thr Tyr Ala Arg Arg Trp Cys Arg 

1635. 1640 1645 ■ 

Ala Gly Ala Gly Gly Thr Gly Arg Trp Gin Pro Arg Gly Thr Val Leu 

1650 1655 1660 

Val Thr Gly Gly Thr Gly Gly Val Gly Arg His Val Ala Arg Trp Leu 
1665 1670 1675 1680 

Ala Arg Gin Gly Thr Pro Cys Leu Val Leu Ala Ser Arg Arg Gly Pro 

1685 1690 1695 

Asp Ala Asp Gly Val Glu Glu Leu Leu Thr Glu Leu Ala Asp Leu Gly 

1700 1705 1710 

Thr Arg Ala Thr Val Thr Ala Cys Asp Val Thr Asp Arg Glu Gin Leu 

1715 1720 1725 

Arg Ala Leu Leu Ala Thr Val Asp Asp Glu His Pro Leu -Ser Ala Val 

1730 1735 1740 

Phe His Val Ala Ala Thr Leu Asp Asp Gly Thr Val Glu Thr Leu Thr 
1745 1750 1755 1760 

Gly Asp Arg lie Glu Arg Ala Asn Arg Ala Lys Val Leu Gly Ala Arg 

1765 1770 1775 

Asn Leu His Glu Leu Thr Arg Asp Ala Asp Leu Asp Ala Phe Val Leu 

1780 1785 1790 

Phe Ser Ser Ser Thr Ala Ala Phe Gly Ala Pro Gly Leu Gly Gly Tyr 

1795 1800 1805 

Val Pro Gly Asn Ala Tyr Leu Asp Gly Leu Ala Gin Gin Arg Arg Ser 

1810 1815 1820 

Glu Gly Leu* Pro Ala Thr Ser Val Ala Trp Gly Thr Trp Ala Gly Ser 
1825 1830 1835 1840 

Gly Met Ala Glu Gly Pro Val Ala Asp .Arg Phe Arg Arg His Gly Val 

1845 1850 1855 

Met Glu Met His Pro Asp Gin Ala Val Glu Gly Leu Arg Val Ala Leu 

1860 1865 1870 

Val Gin Gly Glu Val Ala Pro lie Val Val Asp lie Arg Trp Asp Arg 

1875 1880 1885 

Phe Leu Leu Ala Tyr Thr Ala Gin Arg Pro Thr Arg Leu Phe Asp Thr 

1890 1895 1900 

Leu Asp Glu Ala Arg Arg Ala Ala Pro Gly Pro Asp Ala Gly Pro Gly 
1905 1910 1915 1920 

Val Ala Ala Leu Ala Gly Leu Pro Val Gly Glu Arg Glu Lys Ala Val . 

1925 1930 1935 

Leu Asp Leu Val Arg Thr His Ala Ala Ala Val Leu Gly His Ala Ser 

1940 1945 1950 

Ala Glu Gin Val Pro Val Asp Arg Ala Phe Ala Glu Leu Gly Val Asp. 

1955 1960 1965 

Ser Leu Ser Ala Leu Glu Leu Arg Asn Arg Leu Thr Thr Ala Thr Gly 

1970 1975 1980. 

Val Arg Leu Ala Thr Thr Thr Val Phe Asp His Pro Asp Val Arg Thr 
1985 1990 1995 2000 

Leu Ala Gly His Leu Ala Ala Glu Leu Gly Gly Gly Ser Gly Arg Glu 

2005 2010 2015 

Arg Pro Gly Gly Glu Ala Pro Thr Val Ala Pro Thr Asp Glu Pro lie 

2020 . 2025 2030 

Ala lie Val Gly Met Ala Cys Arg Leu Pro Gly Gly Val Asp Ser Pro 

2035 2040 2045 

Glu Gin- Leu Trp Glu Leu lie Val Ser Gly Arg Asp Thr Ala Ser Ala 
2050 2055 2060 
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Ala Pro Gly Asp Arg Ser Trp Asp Pro Ala Glu Leu Met Val Ser Asp 
2065 2070 2075 2080 

Thr Thr Gly Thr Arg Thr Ala Phe Gly Asn Phe Met Pro Gly Ala Gly 

2085 2090 2095 

Glu Phe Asp Ala Ala Phe Phe Gly He Ser Pro Arg Glu Ala Leu Ala 

2100 2105 . 2110 

Met Asp Pro Gin Gin Arg His Ala Leu Glu Thr Thr Trp Glu Ala Leu 

2115 2120 2125 

Glu Asn Ala Gly He Arg Pro Glu Ser Leu Arg Gly Thr Asp Thr Gly 

2130 2135 2140 

Val Phe Val Gly Met Ser His Gin Gly Tyr Ala Thr Gly Arg Pro Lys 
2145 2150 2155 2160 

Pro Glu Asp Glu Val Asp Gly Tyr Leu Leu Thr Gly Asn Thr Ala Ser 

2165 2170 2175 

Val Ala Ser Gly Arg He Ala Tyr Val Leu Gly Leu Glu Gly Pro Ala 

2180 2185 2190 

He Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Leu His Val 

2195 2200 2205 

Ala Ala Gly Ser Leu Arg Ser Gly Asp Cys Gly Leu Ala Val Ala Gly 

2210 2215 2220 

Gly Val Ser Val Met Ala Gly Pro Glu Val Phe Arg Glu Phe Ser Arg 
2225 2230 2235 2240 

Gin Gly Ala Leu Ala Pro Asp Gly Arg Cys Lys Pro Phe Ser Asp Glu 

2245 2250 2255 

Ala Asp Gly Phe Gly Leu Gly Glu Gly Ser Ala Phe Val Val Leu Gin 

2260 2265 2270 

Arg Leu Ser Val Ala Val Arg Glu Gly Arg Arg Val Leu Gly Val Val 

2275 2280 2285 

Val Gly Ser Ala Val Asn Gin Asp Gly Ala Ser Asn Gly Leu Ala Ala 

2290 2295 2300 

Pro Ser Gly Val Ala Gin Gin Arg Val He Arg Arg Ala Trp Gly Arg 
2305 " 2310 2315 2320 

Ala Gly Val Ser Gly Gly Asp Val Gly Val Val Glu Ala His Gly Thr 

2325 2330 2335 

Gly Thr Arg Leu Gly Asp Pro Val Glu Leu Gly Ala Leu Leu Gly Thr 

2340 2345 2350 

Tyr Gly Val Gly Arg Gly Gly Val Gly Pro Val Val Val Gly Ser Val 

2355 2360 2365 

Lys Ala Asn Val Gly His Val Gin Ala Ala Ala Gly Val Val Gly Val 

2370 2375 2380 

He Lys Val Val Leu Gly Leu Gly Arg Gly Leu Val Gly Pro Met Val 
2385 2390 2395 2400 

Cys Arg Gly Gly Leu Ser Gly Leu Val Asp Trp Ser Ser Gly Gly Leu 

2405 2410 2415 

Val Val Ala Asp Gly Val Arg Gly Trp Pro Val Gly Val Asp Gly Val 

2420 2425 2430 

Arg Arg Gly Gly Val Ser Ala Phe Gly Val Ser Gly Thr Asn Ala His 

2435 2440 2445 

Val Val Val Ala Glu Ala Pro Gly Ser Val Val Gly Ala Glu Arg Pro 

2450 2455 2460 

Val Glu Gly Ser Ser Arg Gly Leu Val Gly Val Val Gly Gly Val Val 
2465 2470 2475 2480 

Pro Val Val Leu Ser Ala Lys Thr Glu Thr Ala Leu His Ala Gin Ala 

2485 2490 2495 

Arg Arg Leu Ala Asp His Leu Glu Thr His Pro Asp Val Pro Met Thr 

2500 2505 2510 

Asp Val Val Trp Thr Leu Thr Gin Ala Arg Gin Arg Phe Asp Arg Arg 

2515 2520 2525 

Ala Val Leu Leu Ala Ala Asp Arg Thr Gin Ala Val Glu Arg Leu Arg 

2530 2535 2540 

Gly Leu Ala Gly Gly Glu Pro Gly Thr Gly Val Val Ser Gly Val Ala 
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2545 2550 2555 2560 

Ser Gly Gly Gly Val Val Phe Val Phe Pro Gly Gin Gly Gly Gin Trp 

2565 2570 2575 

Val Gly Met Ala Arg Gly Leu Leu Ser Val Pro Val Phe Val Glu Ser 

2580 2585 2590 

Val Val Glu Cys Asp Ala Val Val Ser Ser Val Val Gly Phe Ser Val 

2595 2600 2605 

Leu Gly Val Leu Glu Gly Arg Ser Gly Ala Pro Ser Leu Asp Arg Val 

2610 " 2615 2620 

Asp Val Val Gin Pro Val Leu Phe Val Val Met Val Ser Leu Ala Arg 
2625 2630 2635 2640 

Leu trp Arg Trp Cys Gly Val Val Pro Ala Ala Val Val Gly His Ser 

2645 2650 2655 

Gin Gly Glu lie Ala Ala Ala Val Val Ala Gly Val Leu Ser Val Gly 

2660 2665 2670 

Asp Gly Ala Arg Val Val Ala Leu Arg Ala Arg Ala Leu Arg Ala Leu 

2675 2680 2685 

Ala Gly His Gly Gly Met Ala Ser Val Arg Arg Gly Arg Asp Asp Val 

2690 2695 2700 

Gin Lys Leu Leu Asp Ser Gly Pro Trp Thr Gly Lys Leu-Glu lie Ala 
2705 2710 2715 2720 

Ala Val Asn Gly Pro Asp Ala Val Val Val Ser Gly Asp Pro Arg Ala 

2725 2730 2735 

Val Thr Glu Leu Val Glu His Cys Asp Gly lie Gly Val Arg Ala Arg 

2740 2745 2750 

Thr lie Pro Val Asp Tyr Ala Ser His Ser Ala Gin Val Glu Ser Leu 

2755 2760 2765 

Arg Glu Glu Leu Leu Ser Val Leu Ala Gly lie Glu Gly Arg Pro Ala 

2770 2775 2780 

Thr Val Pro Phe Tyr Ser Thr Leu Thr Gly Gly Phe Val Asp Gly Thr 
2785 2790 2795 2800 

Glu Leu Asp* Ala Asp Tyr Trp Tyr Arg Asn Leu Arg His Pro Val Arg 

2805 2810 2815 

Phe His Ala Ala Val Glu Ala Leu Ala Ala Arg Asp Leu Thr Thr Phe 

2820 2825 2830 

Val Glu Val Ser Pro His Pro Val Leu Ser Met Ala Val Gly Glu Thr 

2835 2840 2845 

Leu Ala Asp Val Glu Ser Ala Val Thr Val Gly Thr Leu Glu Arg Asp 

2850 2855 2860 

Thr Asp Asp Val Glu Arg Phe Leu Thr Ser Leu Ala Glu Ala His Val 
2865 2870 2875 2880 

His Gly Val Pro Val Asp Trp Ala Ala Val Leu Gly Ser Gly Thr Leu 

2885 2890 2895 

Val Asp Leu Pro Thr Tyr Pro Phe Gin Gly Arg Arg Phe Trp Leu His 

2900 2905 2910 

Pro Asp Arg Gly Pro Arg Asp Asp Val Ala Asp Trp Phe His Arg Val 

2915 2920 2925 

Asp Trp Thr Ala Thr Ala Thr Asp Gly Ser Ala Arg Leu Asp Gly Arg 

2930 2935 2940 

Trp Leu Val Val Val Pro Glu Gly Tyr Thr Asp Asp Gly Trp Val Val 
2945 2950 2955 2960 

Glu Val Arg Ala Ala Leu Ala Ala Gly Gly Ala Glu Pro Val Val Thr 

2965 2970 2975 

Thr Val Glu Glu Val Thr Asp Arg Val Gly Asp Ser Asp Ala Val Val 

2980 2985 2990 

Ser Met Leu Gly Leu Ala Asp Asp Gly Ala Ala Glu Thr Leu Ala Leu 

2995 3000 3005 

Leu Arg Arg Leu Asp Ala Gin Ala Ser Thr Thr Pro Leu Trp Val Val 

3010 3015 3020 

Thr Val- Gly Ala Val Ala Pro Ala Gly Pro Val Gin Arg Pro Glu Gin 
3025 3030 3035 3040 
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Ala Thr Val Trp Gly Leu Ala Leu Val Ala Ser Leu Glu Arg Gly His 

3045 3050 3055 

Arg Trp Thr Gly Leu Leu Asp Leu Pro Gin Thr Pro Asp Pro Gin Leu 

3060 3065 3070 

Arg Pro Arg Leu Val Glu Ala Leu Ala Gly Ala Glu Asp Gin Val Ala 

3075 3080 3085 

Val Arg Ala Asp Ala Val His Ala Arg Arg lie Val Pro Thr Pro Val 

3090 3095 3100 

Thr Gly Ala Gly Pro Tyr Thr Ala Pro Gly Gly Thr lie Leu Val Thr 
3105 3110 3115 3120 

Gly Gly Thr Ala Gly Leu Gly Ala Val Thr Ala Arg Trp Leu Ala Glu 

3125 3130 3135 

Arg Gly Ala Glu His Leu Ala Leu Val Ser Arg Arg Gly Pro Gly Thr 

3140 3145 3150 

Ala Gly Val Asp Glu Val Val Arg Asp Leu Thr Gly Leu Gly Val Arg 

3155 3160 3165 

Val Ser Val His Ser Cys Asp Val Gly Asp Arg Glu Ser Val Gly Ala 

3170 3175 3180 

Leu Val Gin Glu Leu Thr Ala Ala Gly Asp Val Val Arg Gly Val Val 
3185 3190 3195 . 3200 

His Ala Ala Gly Leu Pro Gin Gin Val Pro Leu Thr Asp Met Asp Pro 

3205 3210 3215 

Ala Asp Leu Ala Asp Val Val Ala Val Lys Val Asp Gly Ala Val His 

3220 3225 3230 

Leu Ala Asp Leu Cys Pro Glu Ala Glu Leu Phe Leu Leu Phe Ser Ser 

3235 3240 3245 

Gly Ala Gly Val Trp Gly Ser Ala Arg Gin Gly Ala Tyr Ala Ala Gly 

3250 3255 3260 

Asn Ala Phe Leu Asp Ala Phe Ala Arg His Arg Arg Asp Arg Gly Leu 
3265 3270 3275 3280 

Pro Ala Thr Ser Val Ala Trp Gly Leu Trp Ala Ala Gly Gly Met Thr 

3285 3290 3295 

Gly Asp Gin Glu Ala Val Ser Phe Leu Arg Glu Arg Gly Val Arg Pro 

3300 3305 3310 

Met Ser Val Pro Arg Ala Leu Glu Ala Leu Glu Arg Val Leu Thr Ala 

3315 3320 3325 

Gly Glu Thr Ala Val Val Val Ala Asp Val Asp Trp Ala Ala Phe Ala 

3330 3335 3340 

Glu Ser Tyr Thr Ser Ala Arg Pro Arg Pro Leu Leu His Arg Leu Val 
3345 3350 3355 3360 

Thr Pro Ala Ala Ala Val Gly Glu Arg Asp Glu Pro Arg Glu Gin Thr 

3365 3370 3375 

Leu Arg Asp Arg Leu Ala Ala Leu Pro Arg Ala Glu Arg Ser Ala Glu 

3380 3385 3390 

Leu Val Arg Leu Val Arg Arg Asp Ala Ala Ala Val Leu Gly Ser Asp 

3395 3400 3405 

Ala Lys Ala Val Pro Ala Thr Thr Pro Phe Lys Asp Leu Gly Phe Asp 

3410 3415 3420 

Ser Leu Ala Ala Val Arg Phe Arg Asn Arg Leu Ala Ala His Thr Gly 
3425 3430 3435 3440 

Leu Arg Leu Pro Ala Thr Leu Val Phe Glu His Pro Asn Ala Ala Ala 

3445 3450. 3455 

Val Ala Asp Leu Leu His Asp Arg Leu Gly Glu Ala Gly Glu Pro Thr 

3460 3465 3470 

Pro Val Arg Ser Val Gly Ala Gly Leu Ala Ala Leu Glu Gin Ala Leu 

3475 3480 3485 

Pro Asp Ala Ser Asp Thr Glu Arg Val Glu Leu Val Glu Arg • Leu Glu 

3490 3495 3500 

Arg Met Leu Ala Gly Leu Arg Pro Glu Ala Gly Ala Gly Ala Asp Ala 
3505 3510 3515 3520 

Pro Thr Ala Gly Asp Asp Leu Gly Glu Ala Gly Val Asp Glu Leu Leu 
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3525 3530 3535 

Asp Ala Leu Glu Arg Glu Leu Asp Ala Arg 

3540 3545 

<210> 14 
<211> 3562 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 14 

Met Thr Asp Asn Asp Lys Val Ala Glu Tyr Leu Arg Arg Ala Thr Leu 

1-5 10 15 

Asp Leu Arg Ala Ala Arg Lys Arg Leu Arg Glu . Leu Gin Ser Asp Pro 

20 25 30 

lie Ala Val Val Gly Met Ala Cys Arg Leu Pro Gly Gly Val His Xeu 

35 40 45 

Pro Gin His Leu Trp Asp Leu Leu Arg Gin Gly His Glu Thr Val Ser 

50 55 60 

Thr Phe Pro Thr Gly Axg Gly Trp Asp Leu Ala Gly Leu Phe His Pro 
65 70 75 80 

Asp Pro Asp His Pro Gly Thr Ser Tyr Val Asp Arg Gly Gly Phe Leu 

85 90 95 

Asp Asp Val Ala Gly Phe Asp Ala Glu Phe Phe Gly lie Ser Pro Arg 

100 105 110 

Glu Ala Thr Ala Met Asp Pro Gin Gin Arg Leu Leu Leu Glu Thr Ser 

115 120 125 

Trp Glu Leu Val Glu Ser Ala Gly lie Asp Pro His Ser Leu Arg Gly 

130 135 140 

Thr Pro Thr Gly Val Phe Leu Gly Val Ala Arg Leu Gly Tyr Gly Glu 
145 150 155 160 

Asn Gly Thr Glu Ala Gly Asp Ala Glu Gly Tyr Ser Val Thr Gly Val 

165 170 175 

Ala Pro Ala Val Ala Ser Gly Arg lie Ser Tyr Ala Leu Gly Leu Glu 

180 185 190 

Gly Pro Ser lie Ser Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala 

195 200 205 

Leu His Leu Ala Val Glu Ser Leu Arg Leu Gly Glu Ser Ser Leu Ala 

210 215 220 

Val Val Gly Gly Ala Ala Val Met Ala Thr Pro Gly Val Phe Val Asp 
225 230 235 240 

Phe Ser Arg Gin Arg Ala Leu Ala Ala Asp Gly Arg Ser Lys Ala Phe 

245 250 255 

Gly Ala Ala Ala Asp Gly Phe Gly Phe Ser Glu Gly Val Ser Leu Val 

260 * 265 270 

Leu Leu Glu Arg Leu Ser Glu Ala Glu Ser Asn Gly His Glu Val Leu 

275 280 285 

Ala Val lie Arg Gly Ser Ala Leu Asn Gin Asp Gly Ala Ser Asn Gly 

290 295 300 

Leu Ala Ala Pro Asn Gly Thr Ala Gin Arg Lys Val lie Arg Gin Ala 
305 310 315 320 

Leu Arg Asn Cys Gly Leu Thr Pro Ala Asp Val Asp Ala Val Glu Ala 

325 330 335 

His Gly Thr Gly Thr Thr Leu Gly Asp Pro lie Glu Ala Asn Ala Leu 

340 345 350 

Leu Asp Thr Tyr Gly Arg Asp Arg Asp Pro Asp His Pro Leu Trp Leu 

355 360 365 

Gly Ser Val Lys Ser Asn lie Gly His Thr Gin Ala Ala Ala Gly Val 

370 375 380 

Thr Gly Leu Leu Lys Met Val Leu Ala Leu Arg His Glu Glu Leu Pro 
385 390 395 400 

Ala Thr Leu His Val Asp Glu Pro Thr Pro His Val Asp Trp Ser Ser 
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405 410 415 

Gly Ala Val Arg Leu Ala Thr Arg Gly Arg Pro Trp Arg Arg Gly Asp 

420 425 430 

Arg Pro Arg Arg Ala Gly Val Ser Ala Phe Gly lie Ser Gly Thr Asn 

435 440 445 

Ala His Vai lie Val Glu Glu Ala Pro Glu Arg Thr Thr Glu Arg Thr 

450 455 460 

Val Gly Gly Asp Val Gly Pro Val Pro Leu Val Val Ser Ala Arg Ser 
465 470 475 480 

Ala Ala Ala Leu Arg Ala Gin Ala Ala Gin Val Ala Glu Leu Val Glu 

485 490 495 

Gly Ser Asp Val Gly Leu Ala Glu Val Gly Arg Ser Leu Ala Val Thr 

500 505 510 

Arg Ala Arg His Glu His Arg Ala Ala Val Val Ala Ser Thr Arg Ala 

515 520 525 

Glu Ala Val Arg Gly Leu Arg Glu Val Ala Ala Val Glu Pro Arg Gly 

530 535 540 

Glu Asp Thr Val Thr Gly Val Ala Glu Thr Ser Gly Arg Thr Val Val 
545 550 555 560 

Phe Leu Phe Pro Gly Gin Gly Ser Gin Trp Val Gly Met Gly Ala Glu 

565 570 575 

Leu Leu Asp Ser Ala Pro Ala Phe Ala Asp Thr He Arg Ala Cys Asp 

580 585 590 

Glu Ala Met Ala Pro Leu Gin Asp Trp Ser Val Ser Asp Val Leu Arg 

595 600 605 

Gin Glu Pro Gly Ala Pro Gly Leu Asp Arg Val Asp Val Val Gin Pro 

610 615 620 

Val Leu Phe Ala Val Met Val Ser Leu Ala Arg Leu Trp Gin. Ser Tyr 
625 630 635 640 

Gly Val Thr Pro Ala Ala Val Val Gly His Ser Gin Gly Glu He Ala 

645 650 655 

Ala Ala His Val Ala Gly Ala Leu Ser Leu Ala Asp Ala Ala Arg Leu 

660 665 670 

Val Val Gly Arg Ser Arg Leu Leu Arg Ser Leu Ser Gly Gly Gly Gly 

675 680 685 

Met Ser Ala Val Ala Leu Gly Glu Ala Glu Val Arg Arg Arg Leu Arg 

690 695 700 

Ser Trp Glu Asp Arg lie Ser Val Ala Ala Val Asn Gly Pro Arg Ser 
705 710 715 720 

Val Val Val Ala Gly Glu Pro Glu Ala Leu Arg Glu Trp Gly Arg Glu 

725 730 735 

Arg Glu Ala Glu Gly Val Arg Val Arg Glu He Asp Val Asp Tyr Ala 

740 745 750 

Ser His Ser Pro Gin He Asp Arg Val Arg Asp Glu Leu Leu Thr Val 

755 760 765 

Thr Gly Glu He Glu Pro Arg Ser Ala Glu He Thr Phe Tyr Ser Thr 

770 775 780 

Val Asp Val Arg Ala Val Asp Gly Thr Asp Leu Asp Ala Gly Tyr Trp 
785 790 795 800 

Tyr Arg Asn Leu Arg Glu Thr Val Arg Phe Ala Asp Ala Met Thr Arg 

805 810 815 

Leu Ala Asp Ser Gly Tyr Asp Ala Phe Val Glu Val Ser Pro His Pro 

820 825 830 

Val Val Val Ser Ala Val Ala Glu Ala Val Glu Glu Ala Gly Val Glu 

835 840 845 

Asp Ala Val Val Val Gly Thr Leu Ser Arg Gly Asp Gly Gly Pro Gly 

850 855 860 

Ala Phe Leu Arg Ser Ala Ala Thr Ala His Cys Ala Gly Val Asp Val 
865 870 875 880 

Asp Trp Thr Pro Ala Leu Pro Gly Ala Ala Thr He Pro Leu Pro Thr 

885 890 895 
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Tyr Pro Phe Gin Arg Lys Pro Tyr Trp Leu Arg Ser Ser Ala Pro Ala 

900 905 910 

Pro Ala Ser His Asp Leu Ala Tyr Arg Val Ser Trp Thr Pro lie Thr 

915 920 925 

Pro Pro Gly Asp Gly Val Leu .Asp Gly Asp Trp Leu Val Val His Pro 

930 935 940 

Gly Gly Ser Thr Gly Trp Val Asp Gly Leu Ala Ala Ala lie Thr Ala 
945 950 955 960 

Gly Gly Gly Arg Val Val Ala His Pro Val Asp Ser Val Thr Ser Arg 

965 970 • 975 

Thr Gly Leu Ala Glu Ala Leu Ala Arg Arg Asp Gly Thr Phe Arg Gly 

980 985 990 

Val Leu Ser Trp Val Ala Thr Asp Glu Arg His Val Glu Ala Gly Ala 

995 1000 1005 

Val Ala Leu Leu Thr Leu Ala Gin Ala Leu Gly Asp Ala Gly lie Asp 

1010 1015 1020 

Ala Pro Leu Trp Cys Leu Thr Gin Glu Ala Val Arg Thr Pro Val Asp 
1025 1030 1035 1040 

Gly 'Asp Leu Ala Arg Pro Ala Gin Ala Ala Leu His Gly Phe Ala Gin 

1045 1050 1055 

Val Ala Arg Leu Glu Leu Ala Arg Arg Phe Gly Gly Val Leu Asp Leu 

1060 1065 1070 

Pro Ala Thr Val Asp Ala Ala Gly Thr Arg Leu Val Ala Ala Val Leu 

1075 1080 1085 

Ala Gly Gly Gly Glu Asp Val Val Ala Val Arg Gly Asp Arg Leu Tyr 

1090 1095 1100 

Gly Arg Arg Leu Val Arg Ala. Thr Leu Pro Pro Pro Gly Gly Gly Phe 
1105 1110 1115 1120 

Thr Pro His Gly Thr Val Leu Val Thr Gly Ala Ala Gly Pro Val Gly 

1125 1130 1135 

Gly Arg Leu Ala Arg Trp Leu Ala Glu Arg Gly Ala Thr Arg Leu Val 

•1140 1145 1150 

Leu Pro Gly Ala His Pro Gly Glu Glu Leu Leu Thr Ala lie Arg Ala 

1155 1160 1165 

Ala Gly Ala Thr Ala Val Val Cys Glu Pro Glu Ala Glu Ala Leu Arg 

1170 1175 1180 

Thr Ala He Gly Gly Glu Leu Pro Thr Ala Leu Val His Ala Glu Thr 
1185 1190 1195 1200 

Leu Thr Asn Phe Ala Gly Val Ala Asp Ala Asp Pro Glu Asp Phe Ala 

1205 1210 1215 

Ala Thr Val Ala Ala Lys Thr Ala Leu Pro Thr Val Leu Ala Glu Val 

1220 1225 1230 

Leu Gly Asp His Arg Leu Glu Arg Glu Val Tyr Cys Ser Ser Val Ala 

1235 1240 1245 

Gly Val Trp Gly Gly Val Gly Met Ala Ala Tyr Ala Ala Gly Ser Ala 

1250 1255 1260 

Tyr Leu Asp Ala Leu Val Glu His Arg Arg Ala Arg Gly His Ala Ser 
1265 1270 1275 1280 

Ala Ser Val Ala Trp Thr Pro Trp Ala Leu Pro Gly Ala Val Asp Asp 

1285 1290 1295 

Gly Arg Leu Arg Glu Arg Gly Leu Arg Ser Leu Asp Val Ala Asp Ala 

1300 1305 " 1310 

Leu Gly Thr Trp Glu Arg Leu Leu Arg Ala Gly Ala Val Ser Val Ala 

1315 1320 1325 

Val Ala Asp Val Asp Trp Ser Val Phe Thr Glu Gly Phe Ala Ala He 

1330 1335 1340 

Arg Pro Thr Pro Leu Phe Asp Glu Leu Leu Asp Arg Arg Gly Asp Pro 
1345 1350 1355 1360 

Asp Gly Ala Pro Val Asp Arg Pro Gly Glu Pro Ala Gly Glu Trp Gly 

1365 1370 1375 

Arg Arg He Ala Ala Leu Ser Pro Gin Glu Gin Arg Glu Thr Leu Leu 
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1380 1385 1390 

Thr Leu Val Gly Glu Thr Val Ala Glu Val Leu Gly His Glu Thr Gly 

1395 1400 1405 

Thr Glu lie Asn Thr Arg Arg Ala Phe Ser Glu Leu Gly Leu Asp Ser 

1410 1415 1420 

Leu Gly Ser Met Ala Leu Arg Gin Arg Leu Ala Ala Arg Thr Gly Leu 
1425 1430 1435 1440 

Arg Met Pro Ala Ser Leu Val Phe Asp His Pro Thr Val Thr Ala Leu 

1445 • 145.0 1455 

Ala Arg Tyr Leu Arg Arg Leu Val Val Gly Asp Ser Asp Pro Thr Pro 

1460 1465 1470 

Val Arg Val Phe Gly Pro Thr Asp Glu Ala Glu Pro Val Ala Val Val 

1475 1480 1485 

Gly lie Gly Cys Arg Phe Pro Gly Gly He Ala Thr Pro Glu Asp Leu 

!4 90 14 95 1500 

Trp Arg Val Val Ser Glu Gly Thr Ser He Thr Thr Gly Phe Pro Thr 
150 5 1510 1515 1520 

Asp Arg Gly Trp Asp Leu Arg Arg Leu Tyr His Pro Asp Pro Asp His 

1525 1530 1535 

Pro Gly Thr Ser Tyr Val Asp Arg Gly Gly Phe Leu Asp .Gly Ala Pro 

1540 1545 1550 

Asp Phe Asp Pro Gly Phe Phe Gly He Thr Pro Arg Glu Ala Leu Ala 

1555 1560 1565 

Met Asp Pro Gin Gin Arg Leu Thr Leu Glu He Ala Trp Glu Ala Val 

1570 1575 1580 

Glu Arg Ala Gly He Asp Pro Glu Thr Leu Leu Gly Ser Asp Thr Gly 
1585 1590 1595 1600 

Val Phe Val Gly Met Asn Gly Gin Ser Tyr Leu Gin Leu Leu Thr Gly 

1605 1610 1615 

Glu Gly Asp Arg Leu Asn Gly Tyr Gin Gly Leu Gly Asn. Ser Ala Ser 

1620 1625 1630 

Val Leu Ser* Gly Arg Val Ala Tyr Thr Phe Gly Trp Glu Gly Pro Ala 

1635 1640 1645 

Leu Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala He His Leu 

1650 1655 1660 

Ala Met Gin Ser Leu Arg Arg Gly Glu Cys Ser Leu Ala Leu Ala Gly 
1065 1670 1675 1680 

Gly Val Thr Val Met Ala Asp Pro Tyr Thr -Phe Val Asp Phe Ser Ala 

1685 1690 1695 

Gin Arg Gly Leu Ala Ala Asp Gly Arg Cys Lys Ala Phe Ser Ala Gin 

1700 1705 . 1710 

Aid Asp Gly Phe Ala Leu Ala Glu Gly Val Ala Ala Leu Val Leu Glu 

1715 1720 1725 

Pro Leu Ser Lys Ala Arg Arg Asn Gly His Gin Val Leu Ala Val Leu 

1730 1735 1740 

Arg Gly Ser Ala Val* Asn Gin Asp Gly Ala Ser Asn Gly Leu Ala Ala 
1745 1750 1755 1760 

Pro Asn Gly Pro Ser Gin Glu Arg Val He Arg Gin Ala Leu Thr Ala 

1765 1770 1775 

Ser Gly Leu Arg Pro Ala Asp Val Asp Met Val Glu Ala His Gly Thr 

1780 1785 1790 

Gly Thr Glu Leu Gly Asp Pro He Glu Ala Gly Ala Leu He Ala Ala 

1795 1800 1805 

Tyr Gly Arg Asp Arg Asp Arg Pro Leu Trp Leu Gly Ser Val Lys Thr 

1810 1815 1820 

Asn He Gly His Thr Gin Ala Ala Ala Gly Ala Ala Gly Val He Lys 
1825 1830 1835 1840 

Ala Val Leu Ala Met Arg His Gly Val Leu Pro Arg Ser Leu His Ala 

1845 1850 1855 

Asp Glu Leu Ser Pro His He Asp Trp Ala Asp Gly Lys Val Glu Val 

I860 1865 1870 
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Leu Arg Glu Ala Arg Gin Trp Pro Pro Gly Glu Arg Pro Arg Arg Ala 

1875 1880 1885 

Gly Val Ser Ser Phe Gly Val Ser Gly Thr Asn Ala His Val lie Val 

1890 1895 1900 

Glu Glu Ala Pro Ala Glu Pro Asp Pro Glu Pro Val Pro Ala Ala Pro 
1905 1910 1915 1920 

Gly Gly Pro Leu Pro Phe Val Leu His Gly Arg Ser Val Gin Thr Val 

1925 1930 1935 

Arg Ser Gin Ala Arg Thr Leu Ala Glu His Leu Arg Thr Thr Gly His 

1940 1945. 1950 

Arg Asp Leu Ala Asp Thr Ala Arg Thr Leu Ala Thr Gly Arg Ala Arg 

1955 1960 1965 

Phe Asp Val Arg Ala Ala Val Leu Gly Thr Asp Arg Glu Gly Val Cys 

1970 1975 1980 

Ala Ala Leu Asp Ala Leu Ala Gin Asp Arg Pro Ser Pro Asp Val Val 
1985 1990 1995 2000 

Ala Pro Ala Val Phe Ala Ala Arg Thr Pro Val Leu Val Phe Pro Gly 

2005 2010 2015 

Gin Gly Ser Gin Trp Val Gly Met Ala Arg Asp Leu Leu Asp Ser Ser 

2020 2025 -2030 

Glu Val Phe Ala Glu Ser Met Gly Arg Cys Ala Glu Ala Leu Ser Pro 

2035 2040 2045 

Tyr Thr Asp Trp Asp Leu Leu Asp Val Val Arg Gly Val Gly Asp Pro 

2050 2055 2060 

Asp Pro Tyr Asp Arg Val Asp Val Leu Gin Pro Val Leu Phe Ala Val 
2065 2070 2075 2080 

Met Val Ser Leu Ala Arg Leu Trp Gin Ser Tyr Gly Val Thr Pro Gly 

2085 2090 2095 

Ala Val Val Gly His Ser Gin Gly Glu He Ala . Ala Ala His Val Ala 

2100 2105 2110 

Gly Ala Leu Ser Leu Ala Asp Ala Ala Arg Val Val Ala Leu Arg Ser 

2115 2120 2125 

Arg Val Leu Arg Glu Leu Asp Asp Gin Gly Gly Met Val Ser Val Gly 

2130 2135 2140 

Thr Ser Arg Ala Glu Leu Asp Ser Val Leu Arg Arg Trp Asp Gly Arg 
2145 2150 2155 2160 

Val Ala Val Ala Ala Val Asn Gly Pro Gly Thr Leu Val Val Ala Gly 

2165 2170 2175 

Pro Thr Ala Glu Leu Asp Glu Phe Leu Ala Val Ala Glu Ala Arg Glu 

2180 2185 2190 

Met Arg Pro Arg Arg He Ala Val Arg Tyr Ala Ser His Ser Pro Glu 

2195 2200 . 2205 

Val Ala Arg Val Glu Gin Arg Leu Ala Ala Glu Leu Gly Thr Val Thr 

2210 2215 2220 

Ala Val Gly Gly Thr Val Pro Leu Tyr Ser Thr Ala Thr Gly Asp Leu 
2225 2230 2235 2240 

Leu Asp Thr Thr Ala Met Asp Ala Gly Tyr Trp Tyr Arg Asn Leu Arg 

2245 2250 2255 

Gin Pro Val Leu Phe Glu His Ala Val Arg Ser Leu Leu Glu Arg Gly 

2260 2265 2270 

Phe Glu Thr Phe He Glu Val Ser Pro His Pro Val Leu Leu Met Ala 

2275 2280 2285 

Val Glu Glu Thr Ala Glu Asp Ala Glu Arg Pro Val Thr Gly Val Pro 

2290 2295 2300 

Thr Leu Arg Arg Asp His Asp Gly Pro Ser Glu Phe Leu Arg Asn Leu 
2305 2310 2315 2320 

Leu Gly Ala His Val His Gly Val Asp Val Asp Leu Arg Pro Ala Val 

2325 . 2330 2335 

Ala His Gly Arg Leu Val Asp Leu Pro Thr Tyr Pro Phe Asp Arg Gin 

2340 2345 2350 

Arg Leu Trp Pro Lys Pro His Arg Arg Ala Asp Thr Ser Ser Leu Gly 
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2355 2360 2365 

Val Arg Asp Ser Thr His Pro Leu Leu His Ala Ala Val Asp. Val Pro 

2370 2375 2380 

Gly His Gly Gly Ala Val Phe Thr Gly Arg Leu Ser Pro Asp Glu Gin 
2385 2390 2395 2400 

Gin Trp Leu Thr Gin His Val Val Gly Gly Arg Asn Leu Val Pro Gly 

2405 2410 2415 

Ser Val Leu Val Asp Leu Ala Leu Thr Ala Gly Ala Asp Val Gly Val 

2420 2425 2430 

Pro Val Leu Glu Glu Leu Val Leu Gin Gin Pro Leu Val Leu Thr Ala 

2435 2440 2445 

Ala Gly Ala Leu Leu Arg Leu Ser Val Gly Ala Ala Asp Glu Asp Gly 

2450 2455 2460 

Arg Arg Pro Val Glu lie His Ala Ala Glu Asp Val Ser Asp Pro Ala 
2465 2470 2475 2480 

Glu Ala Arg Trp Ser Ala Tyr Ala Thr Gly Thr. Leu Ala Val Gly Val 

2485 2490 2495 

Ala Gly Gly Gly Arg Asp Gly Thr Gin Trp Pro Pro Pro Gly Ala Thr 

2500 2505 2510 

Ala Leu Thr Leu Thr Asp His Tyr Asp Thr Leu Ala Glu.Leu Gly Tyr 

2515 2520 2525 

Glu Tyr Gly Pro Ala Phe Gin Ala Leu Arg Ala Ala Trp Gin His Gly 

2530 2535 2540 

Asp Val Val Tyr Ala Glu Val Ser Leu Asp Ala Val Glu Glu Gly Tyr 
2545 2550 2555 2560 

Ala Phe Asp Pro Val Leu Leu Asp Ala Val Ala Gin Thr Phe Gly Leu 

2565 2570 2575 

Thr Ser Arg Ala Pro Gly Lys Leu Pro Phe Ala Trp Arg Gly Val Thr 

2580 2585 2590 

Leu His Ala Thr Gly Ala Thr Ala Val Arg Val Val Ala Thr Pro Ala 

2595 2600 2605 

Gly Pro Asp* Ala Val Ala Leu Arg Val Thr Asp Pro Thr Gly Gin Leu 

2610 2615 2620 

Val Ala Thr Val Asp Ala Leu Val Val Arg Asp Ala Gly Ala Asp Arg 
2625 2630 2635 2640 

Asp Gin Pro Arg Gly Arg Asp Gly Asp Leu His Arg Leu Glu Trp Val 

2645 2650 2655 

Arg Leu Ala Thr Pro Asp Pro Thr Pro Ala Ala Val Val His Val Ala 

2660 2665 2670 

Ala Asp Gly Leu Asp Asp Leu Leu Arg Ala Gly Gly Pro Ala Pro Gin 

2675 2680 2685 

Ala Val Val Val Arg Tyr Arg Pro Asp Gly Asp Asp Pro Thr Ala Glu 

2690 2695 2700 

Ala Arg His Gly Val Leu Trp Ala Ala Thr Leu Val Arg Arg Trp Leu 
2705 2710 2715 2720 

Asp Asp Asp Arg Trp Pro Ala Thr Thr Leu Val Val Ala Thr Ser Ala 

2725 2730 2735 

Gly Val Glu Val Ser Pro Gly Asp Asp Val Pro Arg Pro Gly Ala Ala 

2740 2745 2750 

Ala Val Trp Gly Val Leu Arg Cys Ala Gin Ala Glu Ser Pro Asp Arg 

2755 2760 . 2765 

Phe Val Leu Val Asp Gly Asp Pro Glu Thr Pro Pro Ala Val Pro Asp 

2770 2775 2780 

Asn Pro Gin Leu Ala Val Arg Asp Gly Ala Val Phe Val Pro Arg Leu 
2785 2790 2795 2800 

Thr Pro Leu Ala Gly Pro Val Pro Ala Val Ala Asp Arg Ala Tyr Arg 

2805 2810 2815 

Leu Val Pro Gly Asn Gly Gly Ser He Glu Ala Val Ala Phe Ala Pro 

2820 2825 2830 

Val Pro Asp Ala Asp Arg Pro Leu Ala Pro Glu Glu Val Arg Val Ala 
2835 2840 2845 
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Val Arg Ala Thr Gly Val Asn Phe Arg Asp Val Leu Leu Ala Leu Gly 

2850 2855 2860 

Met Tyr Pro Glu Pro Ala Glu Met Gly Thr Glu Ala Ser Gly Val Val 
2865 2870 2875 2880 

Thr Glu Val Gly Ser Gly Val Arg Arg Phe Thr Pro Gly Gin Ala Val 

2885 2890 2895 

Thr Gly Leu Phe Gin Gly Ala . Phe Gly Pro Val Ala Val Ala Asp His 

2900 2905 2910 

Arg Leu Leu Thr Pro Val Pro Asp Gly Trp Arg Ala Val Asp Ala Ala 

2915 2920 2925 • 

Ala Val Pro lie Ala Phe Thr Thr Ala His Tyr Ala Leu His Asp Leu 

2930 2935 2940 

Ala Gly Leu Gin Ala Gly Gin Ser Val Leu Val His Ala Ala Ala Gly 
2945 2950 2955 2960 

Gly Val Gly Met Ala Ala Val Ala Leu Ala Arg Arg Ala Gly Ala Glu 

2965 2970 2975 

Val Phe Ala Thr Ala Ser Pro Ala Lys His Pro Thr Leu Arg Ala Leu 

2980 2985 2990 

Gly Leu Asp Asp Asp His lie Ala Ser Ser Arg Glu Ser Gly Phe Gly 

2995 3000 3005 

Glu Arg Phe Ala Ala Arg Thr Gly Gly Arg Gly Val Asp Val Val Leu 

30*10 3015 3020 

Asn Ser Leu Thr Gly Asp Leu Leu Asp Glu Ser Ala Arg Leu Leu Ala 
3025 3030 3035 3040 

Asp Gly Gly Val Phe Val Glu Met Gly Lys Thr Asp Leu Arg Pro Ala 

3045 3050 . 3055 

Glu Gin Phe Arg Gly Arg Tyr Val Pro Phe Asp Leu Ala Glu Ala Gly 

3060 3065 .3070 

Pro Asp Arg Leu Gly Glu lie Leu Glu Glu Val Val Gly Leu Leu Ala 

3075 3080 3085 

Ala Gly Ala Leu Asp Arg Leu Pro Val Ser Val Trp Glu Leu Ser Ala 

3090 ' 3095 3100 

Ala Pro Ala Ala Leu Thr His Met Ser Arg Gly Arg His Val Gly Lys 
3105 3110 3115 3120 

Leu Val Leu Thr Gin Pro Ala Pro Val His Pro Asp Gly Thr Val Leu 

3125 3130 3135 

Val Thr Gly Gly Thr Gly Thr Leu Gly Arg Leu Val Ala Arg His Leu 

3140 -3145 3150 

Val Thr Gly His Gly Val Pro His Leu Leu Val Ala Ser Arg Arg Gly 

3155 3160 3165 

Pro Ala Ala Pro Gly Ala Ala Glu Leu Arg Ala Asp Val Glu Gly Leu 

3170 3175 3180 

Gly Ala Thr lie Glu lie Val Ala Cys Asp Thr Ala Asp Arg Glu Ala 
3185 3190 3195 3200 

Leu Ala Ala Leu Leu Asp Ser lie Pro Ala Asp Arg Pro Leu Thr Gly 

3205 3210 3215 

Val Val His Thr Ala Gly Val Leu Ala Asp Gly Leu Val Thr Ser lie 

3220 3225 3230 

Asp Gly Thr Ala Thr Asp Gin Val Leu Arg Ala Lys Val Asp Ala Ala 

3235 3240 3245 

Trp His Leu His Asp Leu Thr Arg Asp Ala Asp Leu Ser Phe Phe Val 

3250 - 3255" " 3260 

Leu Phe Ser Ser Ala Ala Ser Val Leu Ala Gly Pro Gly Gin Gly Val 
3265 3270 3275 3280 

Tyr Ala Ala Ala Asn Gly Val Leu Asn Ala Leu Ala Gly Gin Arg Arg 

3285 3290 3295 

Ala Leu Gly Leu Pro Ala Lys Ala Leu Gly Trp Gly Leu Trp Ala Gin 

3300 3305 3310 

Ala Ser Glu Met Thr Ser Gly Leu Gly Asp Arg He Ala Arg Thr Gly 

3315 3320 3325 

Val Ala Ala Leu Pro - Thr Glu Arg Ala Leu Ala Leu Phe Asp Ala Ala 
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3330 3335 3340 

Leu Arg Ser Gly Gly Glu Val Leu Phe Pro Leu Ser Val Asp Arg Ser 
3345 3350 3355 3360 

Ala Leu Arg Arg Ala Glu Tyr Val Pro Glu Val Leu Arg Gly Ala Val 

3365 3370 3375 

Arg Ser Thr Pro Arg Ala Ala Asn Arg Ala Glu Thr Pro Gly Arg Gly 

3380 3385 3390 

Leu Leu Asp Arg Leu Val Gly Ala Pro Glu Thr Asp Gin Val Ala Ala 

3395 3400 3405 

Leu Ala Glu Leu Val Arg Ser His Ala Ala Ala Val Ala Gly Tyr Asp 

3410 3415 3420 

Ser Ala Asp Gin Leu Pro Glu Arg Lys Ala Phe Lys Asp Leu Gly Phe 
3425 3430 3435 3440 

Asp Ser Leu Ala Ala Val Glu Leu Arg Asn Arg Leu Gly Val Thr Thr 

3445 3450 3455 

Gly Val Arg Leu Pro Ser Thr Leu Val Phe Asp His Pro Thr Pro Leu 

3460 3465 3470 

Ala Val Ala Glu His Leu Arg Ser Glu Leu Phe Ala Asp Ser Ala Pro 

3475 3480 3485 

Asp Val Gly Val Gly Ala Arg Leu Asp Asp Leu Glu Arg,. Ala Leu Asp 

3490 . 3495 3500 

Ala Leu Pro Asp Ala Gin Gly His Ala Asp Val Gly Ala Arg Leu Glu 
3505 3510 3515 3520 

Ala Leu Leu Arg Arg Trp Gin Ser Arg Arg Pro Pro Glu Thr Glu Pro 

3525 3530 3535 

Val Thr lie Ser Asp Asp Ala Ser Asp Asp Glu Leu Phe Ser Met Leu 

3540 3545 3550 

Asp Arg Arg Leu Gly Gly Gly Gly Asp Val 
3555 3560 

<210> 15 
<211> 3201 ' 

<212> PRT - " # 

<213> Micromonospora megalomicea 

<400> 15 

Met Ser Glu Ser Ser Gly Met Thr Glu Asp Arg Leu Arg Arg Tyr Leu 

15 10 15 

Lys Arg Thr Val Ala Glu Leu Asp Ser Val Thr Gly Arg Leu Asp Glu 

20 25 30 

Val Glu Tyr Arg Ala Arg Glu Pro lie Ala Val Val Gly Met Ala Cys 

35 40 45 

Arg Phe Pro Gly Gly Val Asp Ser Pro Glu Ala Phe Trp Glu Phe lie 

50 55 60 

Arg Asp Gly Gly Asp Ala lie Ala Glu Ala Pro Thr Asp Arg Gly Trp 
65 70 75 80 

Pro Pro Ala Pro Arg Pro Arg Leu Gly Gly Leu Leu Ala Glu Pro Gly 

85 90 95 

Ala Phe Asp Ala Ala Phe Phe Gly lie Ser Pro Arg Glu Ala Leu Ala 

100 105 HO 

Thr .Asp Pro Gin Gin Arg Leu Met Leu Glu lie Ser Trp Glu Ala Leu 

115 120 125 

Glu Arg Ala Gly Phe Asp Pro Ser Ser Leu Arg Gly Ser Ala Gly Gly 

130 135 140 

Val Phe Thr Gly Val Gly Ala Val Asp Tyr Gly Pro Arg Pro Asp Glu 
145 150 155 160 

Ala Pro Glu Glu Val Leu Gly Tyr Val Gly He Gly Thr Ala Ser Ser 

165 170 175 

Val Ala Ser Gly Arg Val Ala Tyr Thr Leu Gly Leu Glu Gly Pro Ala 

180 185 190 

Val Thr Val Asp Thr Ala Cys Ser Ser Gly Leu Thr Ala Val His Leu 
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195 200 205 

Ala Met Glu Ser Leu Arg Arg Asp Glu Cys Thr Leu Val Leu Ala Gly 

210 ' 215 220 

Gly Val Thr Val Met Ser Ser Pro, Gly Ala Phe Thr Glu Phe Arg Ser 
225 230 235 240 

Gin Gly Gly Leu Ala Glu Asp Gly Arg Cys Lys Pro Phe Ser Arg Ala 

245 250 255 

Ala Asp Gly Phe Gly Leu Ala Glu Gly Ala Gly Val Leu Val Leu Gin 

260 265 270 

Arg Leu Ser Val Ala Arg Ala Glu Gly Arg Pro Val Leu Ala Val Leu 

.275 280 285 

Arg Gly Ser Ala lie Asn Gin Asp Gly Ala Ser Asn Gly Leu Thr Ala 

290 295 300 

Pro Ser Gly Pro Ala Gin Arg Arg Val lie Arg Gin Ala Leu Glu Arg 
305 310 315 320 

Ala Arg Leu Arg Pro Val Asp Val Asp Tyr Val Glu Ala His Gly Thr 

325 330 335 

Gly Thr Arg Leu Gly Asp Pro lie Glu Ala His Ala Leu Leu Asp Thr 

340 345 350 

Tyr Gly Ala Asp Arg Glu Pro Gly Arg Pro Leu Trp Val Gly Ser Val 

355 360 365 

Lys Ser Asn lie Gly His Thr Gin Ala Ala Ala Gly Val Ala Gly Val 

370 375 380 

Met Lys Thr Val Leu Ala Leu Arg His Arg Glu lie Pro Ala Thr Leu 
385 390 395 400 

His Phe Asp Glu Pro Ser Pro His Val Asp Trp Asp Arg Gly Ala Val 

405 410 415 

Ser Val Val Ser Glu Thr Arg Pro Trp Pro Val Gly Glu Arg Pro Arg 

420 425 430 

Arg Ala Gly Val Ser Ser Phe Gly lie Ser Gly Thr Asn Ala His Val 

435 440 445 

He Val Glu* Glu Ala Pro Ser Pro Gin Ala Ala Asp Leu Asp Pro Thr 

450 455 460 

Pro Gly Pro Ala Thr Gly Ala Thr Pro Gly Thr Asp Ala Ala Pro Thr 
465 470 475 480 

Ala Glu Pro Gly Ala Glu Ala Val Ala Leu Val Phe Ser Ala Arg Asp 

485 490 495 

Glu Arg Ala Leu Arg Ala Gin Ala Ala Arg Leu Ala Asp Arg Leu Thr 

500 505 510 

Asp Asp Pro Ala Pro Ser Leu Arg Asp Thr Aia Phe Thr Leu Val Thr 

515 520 525 

Arg Arg Ala Thr Trp Glu His Arg Ala Val Val Val Gly Gly Gly Glu 

530 535 540 

Glu Val Leu Ala Gly Leu Arg Ala Val Ala Gly Gly Arg Pro Val Asp 
545 550 555 560 

Gly Ala Val Ser Gly Arg Ala Arg Ala Gly Arg Arg Val Val Leu Val 

565 570 575 

Phe Pro Gly Gin Gly Ala Gin Trp Gin Gly Met Ala Arg Asp Leu Leu 

580 585 590 

Arg Gin Ser Pro Thr Phe Ala Glu Ser He Asp Ala Cys Glu Arg Ala 

595 600 605 

Leu Ala Pro His Val Asp Trp Ser Leu Arg Glu Val Leu Asp Gly Glu 

610 615 620 

Gin Ser Leu Asp Pro Val Asp Val Val Gin Pro Val Leu Phe Ala Val 
625 630 635 640 

Met Val Ser Leu Ala Arg Leu Trp Gin Ser Tyr Gly Val Thr Pro Gly 

645 650 655 

Ala Val Val Gly His Ser Gin Gly Glu He Ala Ala Ala His Val Ala 

660 665 670 

Gly Ala Leu Ser Leu Ala Asp Ala Ala Arg Val Val Ala Leu Arg Ser 
675 680 685 
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Arg Val Leu Arg Arg Leu Gly Gly His Gly Gly Met Ala Ser Phe Gly 

690 695 700 

Leu His Pro Asp Gin Ala Ala Glu Arg lie Ala Arg Phe Ala Gly Ala 
705 710 715 720 

Leu Thr Val Ala Ser Val Asn Gly Pro Arg Ser Val Val Leu Ala Gly 

725 730 735 

Glu Asn Gly Pro Leu Asp Glu Leu lie Ala Glu Cys Glu Ala Glu Gly 

740 745 750 

Val Thr Ala Arg Arg lie Pro Val Asp Tyr Ala Ser His- Ser Pro Gin 

755 " 760 765 

Val Glu Ser Leu Arg Glu Glu Leu Leu Ala Ala Leu Ala Gly Val Arg 

770 775 780 

Pro Val Ser Ala Gly lie Pro Leu Tyr Ser Thr Leu Thr Gly Gin Val 
785 790 795 800 

He Glu Thr Ala Thr Met Asp Ala Asp Tyr Trp Phe Ala Asn Leu Arg 

805 810 815 

Glu Pro Val Arg Phe Gin Asp Ala Thr Arg Gin Leu Ala Glu Ala Gly 

820 825 830 

Phe Asp Ala Phe Val Glu Val Ser Pro His Pro Val Leu Thr Val Gly 

835 840 845 

Val Glu Ala Thr Leu Glu Ala Val Leu Pro Pro Asp Ala *Asp Pro Cys 

850 855 860 

Val Thr Gly Thr Leu Arg Arg Glu Arg Gly Gly Leu Ala Gin Phe His 
865 870 875 880 

Thr Ala Leu Ala Glu Ala Tyr Thr Arg Gly Val Glu Val Asp Trp Arg 

885 890 895 

Thr Ala Val Gly Glu Gly Arg Pro Val Asp Leu Pro Val Tyr Pro Phe 

900 905 910 

Gin Arg Gin Asn Phe Trp Leu Pro Val Pro Leu Gly Arg Val Pro Asp 

915 920 925 

Thr Gly Asp Glu Trp Arg Tyr Gin Leu Ala Trp His Pro Val Asp Leu 

930 ■ 935 940 

Gly Arg Ser Ser Leu Ala Gly Arg Val Leu Val Val Thr Gly Ala Ala 
945 950 955 960 

Val Pro Pro Ala Trp Thr Asp Val Val Arg Asp Gly Leu Glu Gin Arg 

965 970 975 

Gly Ala Thr Val Val Leu Cys Thr Ala Gin Ser Arg Ala Arg lie Gly 

980 985 990 

Ala Ala Leu Asp Ala Val Asp Gly Thr Ala Leu Ser Thr Val Val Ser 

995 1000 1005 

Leu Leu Ala Leu Ala Glu Gly Gly Ala Val Asp Asp Pro Ser Leu Asp 

1010 1015 1020 

Thr Leu Ala Leu Val Gin Ala Leu Gly Ala Ala Gly He Asp Val Pro 
1025 1030 . 1035 1040 

Leu Trp Leu Val Thr Arg Asp Ala Ala Ala Val Thr Val Gly Asp Asp 

1045 1050 1055 

Val Asp Pro Ala Gin Ala Met Val Gly Gly Leu Gly Arg Val Val Gly 

1060 1065 1070 

Val Glu Ser Pro Ala Arg Trp Gly Gly Leu Val Asp Leu Arg Glu Ala 

1075 1080 1085 

Asp Ala Asp Ser Ala Arg Ser Leu Ala Ala He Leu Ala Asp Pro Arg 

1090 1095 1100 

Gly Glu Glu Gin Phe Ala He Arg Pro Asp Gly Val Thr Val Ala Arg 
1105 1110 1H5 H20 

Leu Val Pro Ala Pro Ala Arg Ala Ala Gly Thr Arg Trp Thr Pro Arg 

1125 1130 1135 

Gly Thr Val Leu Val Thr Gly Gly Thr Gly Gly He Gly Ala His Leu 

1140 1145 1150 

Ala Arg Trp Leu Ala Gly Ala Gly Ala Glu His Leu Val Leu Leu Asn 

1155 H60 1165 

Arg Arg Gly Ala Glu Ala Ala Gly Ala Ala Asp Leu Arg Asp Glu Leu 
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1170 1175 H80 

Val Ala Leu Gly Thr Gly Val Thr lie Thr Ala Cys Asp Val Ala Asp 
1185 1190 1195 1200 

Arg Asp Arg Leu Ala Ala Val Leu Asp Ala Ala Arg Ala Gin Gly Arg- 

1205 . 1210 1215 

Val Val Thr Ala Val Phe His Ala Ala Gly lie Ser Arg Ser Thr Ala 

1220 1225 1230 

Val Gin Glu Leu Thr Glu Ser Glu Phe Thr Glu lie Thr Asp Ala Lys 

1235 1240 1245 

Val Arg Gly Thr Ala Asn Leu Ala Glu Leu Cys Pro Glu Leu Asp Ala 

1250 1255 1260 

Leu Val Leu Phe Ser Ser Asn Ala Ala Val Trp Gly Ser Pro Gly Leu 
1265 1270 1275 1280 

Ala Ser Tyr Ala Ala Gly Asn Ala Phe Leu Asp Ala Phe Ala Arg Arg 

1285 1290 1295 

Gly Arg Arg Ser Gly Leu Pro Val Thr Ser lie Ala Trp Gly Leu Trp 

1300 1305 1310 

Ala Gly Gin Asn Met Ala Gly Thr Glu Gly Gly Asp Tyr Leu Arg Ser 

1315 1320 1325 

Gin Gly Leu Arg Ala Met Asp Pro Gin Arg Ala lie Glu Glu Leu Arg 

1330 1335 1340 

Thr Thr Leu Asp Ala Gly Asp Pro Trp Val Ser Val Val Asp Leu Asp 
1345 . 1350 1355 1360 

Arg Glu Arg Phe Val Glu Leu Phe Thr Ala Ala Arg Arg Arg Pro Leu 

1365 1370 1375 

Phe Asp Glu Leu Gly Gly Val Arg Ala Gly Ala Glu Glu Thr Gly Gin 

1380 1385 1390' 

Glu Ser Asp Leu Ala Arg Arg Leu Ala Ser Met Pro Glu Ala Glu Arg 

1395 1400 1405 

His Glu His Val Ala Arg Leu Val Arg Ala Glu Val Ala Ala Val Leu 

1410 1415 1420 

Gly His Gly' Thr Pro Thr Val lie Glu Arg Asp Val Ala Phe Arg Asp 
1425 1430 1435 1440 

Leu Gly Phe Asp Ser Met Thr Ala Val Asp Leu Arg Asn Arg Leu Ala 

1445 1450 1455 

Ala Val Thr Gly Val Arg Val Ala Thr Thr lie Val Phe Asp His Pro 

1460 1465 1470 

Thr Val Asp Arg Leu Thr Ala His Tyr Leu Glu Arg Leu Val Gly Glu 

1475 1480 1485 

Pro Glu Ala Thr Thr Pro Ala Ala Ala Val Val Pro Gin Ala Pro Gly 

1490 1495 1500 

Glu Ala Asp Glu Pro lie Ala lie Val Gly Met Ala Cys Arg Leu Ala 
100S 1510 1515 1520 

Gly Gly Val Arg Thr Pro Asp Gin Leu Trp Asp Phe lie Val Ala Asp 

1525 1530 1535 

Gly Asp Ala Val Thr Glu Met Pro Ser Asp Arg Ser Trp Asp Leu Asp 

1540 1545 1550 

Ala Leu Phe Asp Pro Asp Pro Glu Arg His Gly Thr Ser Tyr Ser Arg 

1555 1560 1565 

His Gly Ala Phe Leu Asp Gly Ala Ala Asp Phe Asp Ala Ala Phe Phe 

1570 1575 1580 

Gly lie Ser Pro Arg Glu Ala Leu Ala Met Asp Pro Gin Gin Arg Gin 
1585 1590 1595 1600 

Val Leu Glu Thr Thr Trp Glu Leu Phe Glu Asn Ala Gly He Asp Pro 

1605 1610 1615 

His Ser Leu Arg Gly Thr Asp Thr Gly Val Phe Leu Gly Ala Ala Tyr 

1620 1625 1630 

Gin Gly Tyr Gly Gin Asn Ala Gin Val Pro Lys Glu Ser Glu Gly Tyr 

1635 ' 1640 1645 

Leu Leu Thr Gly Gly Ser Ser Ala Val Ala Ser Gly Arg He Ala Tyr 
1650 1655 1660 
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Val Leu Gly Leu Glu Gly Pro Ala lie Thr Val Asp Thr Ala Cys Ser 
1665 1670 1675 ' 1680 

Ser Ser Leu Val Ala Leu His Val Ala Ala Gly Ser Leu Arg Ser Gly 

1685 1690 1695 

Asp Cys Gly Leu Ala Val Ala Gly Gly Val Ser Val Met Ala Gly Pro 

1700 1705 1710 

Glu Val Phe Thr Glu Phe Ser Arg Gin Gly Ala Leu Ala Pro Asp Gly 

1715 1720 1725 

Arg Cys Lys Pro Phe Ser Asp Gin Ala Asp Gly Phe Gly Phe Ala Glu 

1730 . 1735 1740 

Gly Val Ala Val Val Leu Leu Gin Arg Leu Ser Val Ala Val Arg Glu 
1745 1750 1755 1760 

Gly Arg Arg Val Leu Gly Val Val Val Gly Ser Ala Val Asn Gin Asp 

1765 1770 1775 

Gly Ala Ser Asn Gly Leu Ala Ala Pro Ser Gly Val Ala Gin Gin Arg 

1780 1785 1790 

Val lie Arg Arg Ala Trp Gly Arg Ala Gly Val Ser Gly Gly Asp Val 

1795 1800 1805 

Gly Val Val Glu Ala His Gly Thr Gly Thr Arg Leu Gly Asp Pro Val 

1810 1815 1820 

Glu Leu Gly Ala Leu Leu Gly Thr Tyr Gly Val Gly Arg Gly Gly Val 
1825 1830 1835 1840 

Gly Pro Val Val Val Gly Ser Val Lys Ala Asn Val Gly His Val Gin 

1845 1850 1855 

Ala Ala Ala Gly Val Val Gly Val lie Lys Val Val Leu Gly Leu Gly 

I860 1865 1870 

Arg Gly Leu Val Gly Pro Met Val Cys Arg Gly Gly Leu Ser Gly Leu 

1875 1880 1885 

Val Asp Trp Ser Ser Gly Gly Leu Val Val Ala Asp Gly Val Arg Gly 

1890 1895 1900 

Trp Pro Val Gly Val Asp Gly Val Arg Arg Gly Gly Val Ser Ala Phe 
1905 ' 1910 1915 1920 

Gly Val Ser Gly Thr Asn Ala His Val Val Val Ala Glu Ala Pro Gly 

1925 1930 1935 

Ser Val Val Gly Ala Glu Arg Pro Val Glu Gly Ser Ser Arg Gly Leu 

1940 1945 1950 

Val Gly Val Ala Gly Gly Val Val Pro Val Val Leu Ser Ala Lys Thr 

1955 I960 1965 

Glu Thr Ala Leu Thr Glu Leu Ala Arg Arg Leu His Asp Ala Val Asp 

1970 1975 1980 

Asp Thr Val Ala Leu Pro Ala Val Ala Ala Thr Leu Ala Thr Gly Arg 
1985 1990 1995 2000 

Ala His Leu Pro Tyr Arg Ala Ala Leu Leu Ala Arg Asp His Asp Glu 

2005 2010 2015 

Leu Arg Asp Arg Leu Arg Ala Phe Thr Thr Gly Ser Ala Ala Pro Gly 

2020 2025 2030 

Val Val Ser Gly Val Ala Ser Gly Gly Gly Val Val Phe Val Phe Pro 

2035 2040 2045 

Gly Gin Gly Gly Gin Trp Val Gly Met Ala Arg Gly Leu Leu Ser Val 

2050 2055 2060 

Pro Val Phe Val Glu Ser Val Val Glu Cys Asp Ala Val Val Ser Ser 
2065 2070 2075 2080 

Val Val Gly Phe Ser Val Leu Gly Val Leu Glu Gly Arg Ser Gly Ala 

2085 2090 2095 

Pro Ser Leu Asp Arg Val Asp Val Val Gin Pro Val Leu Phe Val Val 

2100 2105 2110 

Met Val Ser Leu Ala Arg Leu Trp Arg Trp Cys Gly Val Val Pro Ala 

2115 2120 2125 

Ala Val Val Gly His Ser Gin Gly Glu lie Ala Ala Ala Val Val Ala 

2130 2135 2140 

Gly Val Leu Ser Val Gly Asp Gly Ala Arg Val Val Ala Leu Arg Ala 
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2145 2i50 2155 2160 

Arg Ala Leu Arg Ala Leu Ala Gly His Gly Gly Met Val Ser Leu Ala 

2165 2170 2175 

Val Ser Ala Glu Arg Ala Arg Glu Leu lie Ala Pro Trp Ser Asp Arg 

2180 2185 * 2190 

He Ser Val Ala Ala Val Asn Ser Pro Thr Ser Val Val Val Ser Gly 

2195 2200 - 2205 

Asp Pro Gin Ala Leu Ala Ala Leu Val Ala His Cys Ala Glu Thr Gly 

2210 2215 2220 

Glu Arg Ala Lys Thr Leu Pro Val Asp. Tyr Ala Ser His Ser Ala His 
2225 2230 2235 2240 

Val Glu Gin He Arg Asp Thr lie Leu Thr Asp Leu Ala Asp Val Thr 

2245 2250 2255 

Ala Arg Arg Pro Asp Val Ala Leu Tyr Ser Thr Leu His Gly Ala Arg 

2260 2265 2270 

Gly Ala Gly Thr Asp Met Asp Ala Arg Tyr Trp Tyr Asp Asn Leu Arg 

2275 2280 2285 

Ser Pro Val Arg Phe Asp Glu Ala Val Glu Ala Ala Val Ala Asp Gly 

2290 2295 2300 

Tyr Arg Val Phe Val Glu Met Ser Pro His Pro Val Leu ..Thr Ala Ala 
2305 2310 2315 2320 

Val Gin Glu He Asp Asp Glu Thr Val Ala He Gly Ser Leu His Arg 

2325 2330 2335 

Asp Thr Gly Glu Arg His Leu Val Ala Glu Leu Ala Arg Ala His Val 

2340 2345 2350 

His Gly Val Pro Val Asp Trp Arg Ala He Leu Pro Ala Thr His Pro 

2355 2360 2365 

Val Pro Leu Pro Asn Tyr Pro Phe Glu Ala Thr Arg Tyr Trp Leu Ala 

2370 2375 2380 

Pro Thr Ala Ala Asp Gin Val Ala Asp His Arg Tyr Arg Val Asp Trp 
2385 2390 2395 2400 

Arg Pro Leu" Ala Thr Thr Pro Ala Glu Leu Ser Gly Ser Tyr Leu Val 

2405 2410 2415 

Phe Gly Asp Ala Pro Glu Thr Leu Gly His Ser Val Glu Lys Ala Gly 

2420 2425 2430 

Gly Leu Leu Val Pro Val Ala Ala Pro Asp Arg Glu Ser Leu Ala Val 

2435 2440 2445 

Ala Leu Asp Glu Ala Ala Gly Arg Leu Ala Gly Val Leu Ser Phe Ala 

2450 2455 2460 

Ala Asp Thr Ala Thr His Leu Ala Arg His Arg Leu Leu Gly Glu Ala 
2465 2470 2475 2480 

Asp Val Glu Ala Pro Leu Trp Leu Val Thr Ser Gly Gly Val Ala Leu 

2485 2490 2495 

Asp Asp His Asp Pro He Asp Cys Asp Gin Ala Met Val Trp Gly He 

2500 2505 2510 

Gly Arg Val Met Gly Leu Glu Thr Pro His Arg Trp Gly Gly Leu Val 

2515 2520 2525 

Asp Val Thr Val Glu Pro Thr Ala Glu Asp Gly Val Val Phe Ala Ala 

2530 2535 2540 

Leu Leu Ala Ala Asp Asp His Glu Asp Gin Val Ala Leu Arg Asp Gly 
2545 2550 2555 2560 

He Arg His Gly Arg Arg Leu Val Arg Ala Pro Leu Thr Thr Arg Asn 

2565 2570 2575 

Ala Arg Trp Thr Pro Ala Gly Thr Ala Leu Val Thr Gly Gly Thr Gly 

2580 2585 2590 

Ala Leu Gly Gly His Val Ala Arg Tyr Leu Ala Arg Ser Gly Val Thr 

2595 2600 2605 

Asp Leu Val Leu Leu Ser Arg Ser Gly Pro Asp Ala Pro Gly Ala Ala 

2610 2615 2620 ■ 

Glu Leu Ala Ala Glu Leu Ala Asp Leu Gly Ala Glu Pro Arg Val Glu 
2625 2630 2635 2640 
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Ala Cys Asp Val Thr Asp Gly Pro Arg Leu Arg Ala Leu Val Gin Glu 

2645 2650 2655 

Leu Arg Glu Gin Asp Arg Pro Val Arg lie Val Val His Thr Ala Gly 

2660 2665 2670 

Val Pro Asp Ser Arg Pro Leu Asp Arg lie Asp Glu Leu Glu Ser Val 

2675 2680 2685 

Ser Ala Ala Lys Val Thr Gly Ala Arg Leu Leu Asp Glu Leu Cys Pro 

2690 2695 2700* 

Asp Ala Asp Thr Phe Val Leu Phe Ser Ser Gly Ala Gly Val Trp Gly 
2705 2710 2715 * 2720 

Ser Ala Asn Leu Gly Ala Tyr Ala Ala Ala Asn Ala Tyr Leu Asp Ala 

2725 2730 2735 

Leu Ala His Arg Arg Arg Gin Ala Gly Arg Ala Ala Thr Ser Val Ala 

2740 2745 2750 

Trp Gly Ala Trp Ala Gly Asp Gly Met Ala Thr Gly Asp Leu Asp Gly 

2755 2760 2765 

Leu Thr Arg Arg Gly Leu Arg Ala Met Ala Pro Asp Arg Ala Leu Arg 

2770 2775 2780 

Ala Cys Thr Arg Arg Trp Thr Thr His Asp Thr Cys Val Ser Val Ala 
2785 2790 2795 , 2800 

Asp Val Asp Trp Asp Arg Phe Ala Val Gly Phe Thr Ala Ala Arg Pro 

2805 2810 2815 

Arg Pro Leu lie Asp Glu Leu Val Thr Ser Ala Pro Val Ala Ala Pro 

2820 2825 2830 

Thr Ala Ala Ala Ala Pro Val Pro Ala Met Thr Ala Asp Gin Leu Leu 

2835 2840 2845 

Gin Phe Thr Arg Ser His Val Ala Ala lie Leu Gly His Gin Asp Pro 

2850 2855 2860 

Asp Ala Val Gly Leu Asp Gin Pro Phe Thr Glu Leu Gly Phe Asp Ser 
2865 2870 2875 2880 

Leu Thr Ala Val Gly Leu Arg Asn Gin Leu Gin Gin Ala Thr Gly Arg 

. 2885 2890 2895 

Thr Leu Pro Ala Ala Leu Val Phe Gin His Pro Thr Val Arg Arg Leu 

2900 2905 2910 

Ala Asp His Leu Ala Gin Gin Leu Asp Val Gly Thr Ala Pro Val Glu 

2915 2920 2925 

Ala Thr Gly Ser Val Leu Arg Asp Gly Tyr Arg Arg Ala Gly Gin Thr 

2930 2935 2940 

Gly Asp Val Arg Ser Tyr Leu Asp Leu Leu Ala Asn Leu Ser Glu Phe 
2945 2950 2955 2960 

Arg Glu Arg Phe Thr Asp Ala Ala Ser Leu Gly Gly Gin Leu Glu Leu 

2965 2970 . 2975 

Val Asp Leu Ala Asp Gly Ser Gly Pro Val Thr Val lie Cys Cys Ala 

2980 2985 2990 

Gly Thr Ala Ala Leu Ser Gly Pro His Glu Phe Ala Arg Leu Ala Ser 

2995 3000 3005 

Ala Leu Arg Gly Thr Val Pro Val Arg Ala Leu Ala Gin Pro Gly Tyr 

3010 3015 3020 

Glu Ala Gly Glu Pro Val Pro Ala Ser Met Glu Ala Val Leu Gly Val 
3025 3030 3035 3040 

Gin Ala Asp Ala Val Leu Ala Ala Gin Gly Asp Thr Pro Phe Val Leu 

3045 3050 3055 

Val Gly His Ser Ala Gly Ala Leu Met Ala Tyr Ala Leu Ala Thr Glu 

3060 3065 3070 

Leu Ala Asp Arg Gly His Pro Pro Arg Gly Val Val Leu Leu Asp Val 

3075 3080 3085 

Tyr Pro Pro Gly His Gin Glu Ala Val His Ala Trp Leu Gly Glu Leu 

3090 3095 3100 

Thr Ala Ala Leu Phe Asp His Glu Thr Val Arg Met Asp Asp Thr Arg 
3105 3110 3115 3120 

Leu Thr Ala Leu Gly Ala Tyr Asp Arg Leu Thr Gly Arg Trp Arg Pro 
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3125 3130 3135 

Arg Asp Thr Gly Leu Pro Thr Leu Val Val Ala Ala Ser Glu Pro Met 

3140 3145 3150 

Gly Glu Trp Pro Asp Asp Gly Trp Gin Ser Thr Trp Pro Phe Gly His 

3155 3160 3165 

Asp Arg Val Thr Val Pro Gly Asp His Phe Ser Met Val Gin Glu His 

3170 3175 3180 

Ala Asp Ala lie Ala Arg His lie Asp Ala Trp Leu Ser Gly Glu Arg 
3185 3190 . 3195 3200 

Ala 



<210> 16 
<211> 358 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 16 
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340 345 350 

His Cys Pro Val Glu Leu 
355 

<210> 17 
<211> 422 
<212> PRT 

<213> Micromonospora megalomicea 

<400> 17 . * * 

Met Arg Val Val Phe Ser Ser Met Ala Ser Lys Ser His Leu Phe Gly 

1 5 10 15 

Leu Val Pro Leu Ala Trp Ala Phe Arg Ala Ala Gly His Glu Val Arg • 

20 25 30 

Val Val Ala Ser Pro Ala Leu Thr Asp Asp lie Thr Ala Ala Gly Leu 

35 40 45 

Thr Ala Val Pro Val Gly Thr Asp Val Asp Leu Val Asp Phe Met Thr 

50 55 60 

His Ala Gly Tyr Asp He He Asp Tyr Val Arg Ser Leu Asp Phe Ser 
65 70 75 -. 80 

Glu Arg Asp Pro Ala Thr Ser Thr Trp. Asp His Leu Leu Gly Met Gin 

85 90 95 

Thr Val Leu Thr Pro Thr Phe Tyr Ala Leu Met Ser Pro Asp Ser Leu 

100 105 110 

Val Glu Gly Met lie Ser Phe Cys Arg Ser Trp Arg Pro Asp Trp Ser 

115 120 125 

Ser Gly Pro Gin Thr Phe Ala Ala Ser He Ala Ala Thr Val Thr Gly 

130 135 140 

Val Ala His Ala Arg Leu Leu Trp Gly Pro Asp He Thr Val Arg Ala 
145 ■ 150 155 160 

Arg Gin Lys Phe Leu Gly Leu Leu Pro Gly Gin Pro Ala Ala His Arg 

165 170 175 

Glu Asp Pro Leu Ala Glu Trp Leu Thr Trp Ser Val Glu Arg Phe Gly 

180 185 190 

Gly Arg Val Pro Gin Asp Val Glu Glu Leu Val Val Gly Gin Trp Thr 

195 200 205 

He Asp Pro Ala Pro Val Gly Met Arg Leu Asp Thr Gly Leu Arg Thr 

210 215 220 

Val Gly Met Arg Tyr Val Asp Tyr Asn Gly Pro Ser Val Val Pro Asp 
225 230 235 240 

Trp Leu His Asp Glu Pro Thr Arg Arg Arg Val Cys Leu Thr Leu Gly 

245 250 255 

He Ser Ser Arg Glu Asn Ser He Gly Gin Val Ser Val Asp Asp Leu 

260 265 270 

Leu Gly Ala Leu Gly Asp Val Asp Ala Glu He He Ala Thr Val Asp 

275 280 285 

Glu Gin Gin Leu Glu Gly Val Ala His Val Pro Ala Asn He Arg Thr 

290 295 300 

Val Gly Phe Val Pro Met His Ala Leu Leu Pro Thr Cys Ala Ala Thr 
305 310 315 320 

Val His His Gly Gly Pro Gly Ser Trp His Thr Ala Ala He His Gly 

325 330 335 

Val Pro Gin Val lie Leu Pro Asp Gly Trp Asp Thr Gly Val Arg Ala 

340 345 350 

Gin Arg Thr Glu Asp Gin Gly Ala Gly He Ala Leu Pro Val Pro Glu 

355 360 365 

Leu Thr Ser Asp Gin Leu Arg Glu Ala Val Arg Arg Val Leu Asp Asp 

370 375 380 

Pro Ala Phe Thr Ala Gly Ala Ala Arg Met Arg Ala Asp Met Leu Ala 
385 390 395 400 

Glu Pro Ser Pro Ala Glu Val Val Asp Val Cys Ala Gly Leu Val Gly 
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405 410 415 

Glu Arg Thr Ala Val Gly 

420 

<210> 18 
<211> 323 
<212> PRT 

<213> Micromonospora megalomicea 
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<210> 19 
<211> 247 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 19 

Met Asn Thr Trp Leu Arg Arg Phe Gly Ser Ala Asp Gly His Arg Ala 
15 10 15 
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Arg Leu Tyr Cys Phe Pro His Ala Gly Ala Ala Ala Asp Ser Tyr Leu 

20 25 30 

Asp Leu Ala Arg Ala Leu Ala Pro Glu Val Asp Val Trp Ala Val Gin 

35 40 45 

Tyr Pro Gly Arg Gin Asp Arg Arg Asp Glu Arg Ala -Leu Gly Thr Ala 

50 55 60 

Gly Glu lie Ala Asp Glu Val Ala Ala Val Leu Arg Asp Leu Val Gly 
65 70 75 80 

Glu Val Pro Phe Ala Leu Phe Gly His Ser Met Gly Ala Leu Val Ala 

85 90 95 

Tyr Glu Thr Ala Arg Arg Leu Glu Ala Arg Pro Gly Val Arg Pro Leu 

100 105 110 

Arg Leu Phe Val Ser Gly Gin Thr Ala Pro Arg Val His Glu Arg Arg 

115 120 125 

Thr Asp Leu Pro Asp Glu Asp Gly Leu Val Glu Gin Met Arg Arg Leu 

130 135 140 

Gly Val Ser Glu Ala Ala Leu Ala Asp Gin Gly Leu Leu Asp Met Ser 
145 150 155 160 

Leu Pro Val Leu Arg Ala Asp His Arg Val Leu Arg Ser Tyr Ala Trp 

165 170 - 175 

Gin Ala Gly Pro Pro Leu Arg Ala Gly He Thr Thr Leu Cys Gly Asp 

180 185 190 

Thr Asp Pro Leu Thr Thr Val Glu Asp Ala Gin Arg Trp Leu Pro Tyr 

195 200 205 

Scr Val Val Pro Gly Arg Thr Arg Thr Phe Pro Gly Gly His Phe Tyr 

210 215 220 

Leu Ala Asp His Val Gly Glu Val Ala Glu Ser Val Ala Pro Asp Leu 
225 230 235 240 

Leu Arg Leu Thr Pro Thr Gly 

245 

<210> 20 
<211> 189 
<2\2> PRT 

<213> Micromonospora megalomicea 
*.400> 20 

lie Arg Val Gin Asp Asp Asp Ala Asp Arg Leu Ser Arg Asp Glu Leu 

15 10 15 

Thr Ser lie Ala Leu Val Leu Leu Leu Ala Gly Phe Glu Ala Ser Val 

20 25 30 

Ser Leu He Gly He Gly Thr Tyr Leu Leu Leu Thr His Pro Asp Gin 

35 40 45 

Leu Ala Leu Val Arg Lys Asp Pro Ala Leu Leu Pro Gly Ala Val Glu 

50 55 60 

Glu He Leu Arg Tyr Gin Ala Pro Pro Glu Thr Thr Thr Arg Phe Ala 
05 70 75 80 

Thr Ala Glu Val Glu He Gly Gly Val Thr He Pro Ala Tyr Ser Thr 

85 90 95 

Val Leu He Ala Asn Gly Ala Ala Asn Arg Asp Pro Gly Gin Phe Pro 

100. 105 110 

Asp Pro Asp Arg Phe Asp Val Thr Arg Asp Ser Arg Gly His Leu Thr 

115 120 125 

Phe Gly His Gly He His Tyr Cys Met Gly Arg Pro Leu Ala Lys Leu 

130 135 140 

Glu Gly Glu Val Ala Leu Gly Ala Leu Phe Asp Arg Phe Pro Lys Leu 
145 150 155 160 

Ser Leu Gly Phe Pro Ser Asp Glu Val Val Trp Arg Arg Ser Leu Leu 

165 170 175 

Leu Arg Gly He Asp His Leu Pro Val Arg Pro Asn Gly 

180 185 
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<210> 21 
<211> 33 
<212>.DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic nucleotide DNA duplex 

* 

<400> 21 

taagaattcg gagatctggc ctcagctcta gac 33 

<210> 22 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Complementary oligo 
<400> 22 

aattgtctag agctgaggcc agatctccga attcttaat 39 

<210> 23 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 23 

ttgcagcggt tgtcggtggc ggtgcgggag gggcgtcggg tgttgggtgt ggtggtgggt 60 

tcggcggtga atcaggatgg ggcgagtaat gggttggcgg cgccgtcggg ggtggcgcag 120 

cagcgggtga "ttcggcgggc gtggggtcgt gcgggtgtgt cgggtgggga tgtgggtgtg 180 

gtggaggcgc atgggacggg gacgcggttg ggggatccgg tggagttggg ggcgttgttg 240 

gggacgtatg gggtgggtcg gggtggggtg ggtccggtgg tggtgggttc ggtgaaggcg 300 

aatgtgggtc atgtgcaggc ggcggcgggt gtggtgggtg tgatcaaggt ggtgttgggg 360 

ttgggtcggg ggttggtggg tccgatggtg tgtcggggtg ggttgtcggg gttggtggat 420 

tggtcgtcgg gtgggttggt ggtggcggat ggggtgcggg ggtggccggt gggtgtggat 4 80 

ggggtgcgtc ggggtggggt gtcggcgttt ggggtgtcgg ggacgaat 528 

<210> 24 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 24 

ctgcagcggt tgtcggtggc ggtgcgggag gggcgtcggg tgttgggtgt ggtggtgggt 60 
tcggcggtga atcaggatgg ggcgagtaat gggttggcgg cgccgtcggg ggtggcgcag 120 
cagcgggtga ttcggcgggc gtggggtcgt gcgggtgtgt cgggtgggga tgtgggtgtg 180 
gtggaggcgc atgggacggg gacgcggttg ggggatccgg tggagttggg ggcgttgttg 240 
gggacgtatg gggtgggtcg gggtggggtg ggtccggtgg tggtgggttc ggtgaaggcg 300 
aatgtgggtc atgtgcaggc ggcggcgggt gtggtgggtg tgatcaaggt ggtgttgggg 360 
ttgggtcggg ggttggtggg tccgatggtg tgtcggggtg ggttgtcggg gttggtggat 4 20 

tggtcgtcgg gtgggttggt ggtggcggat ggggtgcggg ggtggccggt gggtgtggat 4 80 

ggggtgcgtc ggggtggggt gtcggcgttt ggggtgtcgg ggacgaat 528 

<210> 25 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 
<220> 
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<221> misc__f eature 
<222> (1) . . . (528) 

<223> Sequence with codon changes as described in the 

specification at page 99, line 22 thru 101, line 23 



<400> 25 

ctgcagcgcc tctccgtcgc cgtccgcgag ggccgccgag tcctcggcgt cgtcgtcggc 60 

tcggccgtca accaagacgg cgcgtcaaac ggcctcgccg cgccctccgg cgtcgcccag 120 

cagcgcgtca tacgccgcgc gtggggacgc gccggagtat cgggcggcga cgtcggagtc 180 

gtcgaggccc acggcaccgg cacccgcctc ggggatcccg tcgagctggg cgccctcctg 24 0 

ggcacgtacg gcgtcggccg cggcggcgtc ggcccggtcg tcgtcggcag cgtcaaggcc 300 

aacgtcggcc acgtccaggc cgcggccggc gtcgtcgggg tcatcaaggt cgtcctcggc 360 

ctcggccgcg ggctggtcgg cccgatggtc tgccgcggcg gcctcagcgg cctcgtcgac 420 

tggtcgtccg gcggcctggt cgtcgcggac ggggtccgcg gctggccggt cggcgtcgac 480 

ggcgtccgcc ggggcggcgt ctcggcgttc ggcgtcagcg ggacgaat 528 



<210> 26 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 



<400> 26 

ggtggagtgt gatgcggtgg tgtcgtcggt ggtggggttt tcggtgttgg gggtgttgga 60 

gggtcggtcg ggtgcgccgt cgttggatcg ggtggatgtg gtgcagccgg tgttgttcgt 120 

ggtgatggtg tcgttggcgc ggttgtggcg gtggtgtggg gttgtgcctg cggcggtggt 180 

gggtcattcg cagggggaga tcgcggcggc ggtggtggcg ggggtgttgt cggtgggtga 240 

tggtgcgcgg gtggtggcgt tgcgggcgcg ggcgttgcgg gcgttggccg g • 291 



<210> 27 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 



<400> 27 

ggtggagtgt gatgcggtgg tgtcgtcggt ggtggggttt tcggtgttgg gggtgttgga 60 

gggtcggtcg ggtgcgccgt cgttggatcg ggtggatgtg gtgcagccgg tgttgttcgt 120 

ggtgatggtg tcgttggcgc ggttgtggcg gtggtgtggg gttgtgcctg cggcggtggt 180 

gggtcattcg cagggggaga tcgcggcggc ggtggtggcg ggggtgttgt cggtgggtga 240 

tggtgcgcgg gtggtggcgt tgcgggcgcg ggcgttgcgg gcgttggccg g 291 



<210> 28 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 
<220> 

<221> misc_feature 
<222> (1) . . . (291) 

<223> Sequence with codon changes as described in the 

specification at page 99, line 22 thru page 101, line 23 



<400> 28 

cgtggagtgc gatgcggtcg tgtcgagcgt cgtcggcttc agcgtgctgg gcgtcctgga 60 

gggccgcagc ggcgccccga gcctggaccg cgtcgacgtg gtccagccgg tcctgttcgt 120 

ggtcatggtc agcctggccc gcctgtggcg ctggtgcggc gtggtcccgg ccgccgtggt 180 

cggccacagc cagggcgaga tcgccgccgc ggtcgtggcc ggcgtcctga gcgtcggcga 24 0 

cggcgcccgc gtcgtggccc tgcgcgcccg cgccctgcgc gccctggccg g' 291 



<210> 29 
<211> 24 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 29 

gaacaactcc tgtctgcggc cgcg 24 

<210> 30 
<211> 40 
<212> DNA . 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 30 

cggaattctc tagagtcacg tctccaaccg cttgtcgagg 40 

<210> 31 
<211> 51 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 31 

tctagactta attaaggagg acacatatga gcgagagcag cggcatgacc g 51 

<210> 32 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 32 

aacgcctccc aggagatctc cagca 25 

<210> 33 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligo 
<400> 33 

aattcatagc ctaggt 16 

<210> 34 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligo 
<400> 34 
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